Forum Discussion
How to extract text from pdf on Windows? It is a scanned PDF
Hi everyone,
I have several scanned PDF files that I need to extract text from. Some PDFs allow me to select and copy the text directly, but others seem to be scanned documents or locked in a way that makes copying difficult.
Could you please suggest a reliable way to extract text from PDF document and save it as Word, TXT, or another editable format. I would prefer a method that works offline because I don't really want to upload private PDF files to online converters.
What tools or built-in Windows options do you recommend for this on a Windows 10 PC?
8 Replies
- DevonZhangIron Contributor
You need to convert pdf to text on Windows PC.
- AlinaimCopper Contributor
To extract text from PDF free. Microsoft has added a powerful OCR feature directly to the standard Snipping Tool. This allows you to capture a picture of the text in your scanned PDF and instantly copy it as readable text.
Here is how to do it step-by-step:
1. Open your Scanned PDF: Use any PDF viewer to open the scanned document on your screen. Zoom in so the text is clear and readable.
2. Open the Snipping Tool:
Press Windows + Shift + S on your keyboard.
The screen will dim slightly, and a small bar will appear at the top with snipping mode options.
3. Capture the Text Area:
Click and drag your mouse to draw a rectangle around the text you want to extract text from PDF free.
When you release the mouse, a notification will pop up. Click on this notification to open the snip in the Snipping Tool editor.
4. Extract the Text:
In the Snipping Tool window, look for the "Text actions" button in the toolbar.
Click it. The tool will highlight all the recognized text in blue.
Click "Copy all text".
5. Paste the Result:
Press Ctrl + V in any document to paste the extracted text.
- JettStoneIron Contributor
OCRFeeder is an open-source tool you can use to extract text from scanned pdf offline, offering advanced control over the OCR process for accurate results.
Instructions: Download and install the software from the official website, import the scanned PDF file, define the text recognition area, select the appropriate language, run the OCR process, and then export or copy the recognized text.
Its advantages include being open-source, offering full offline functionality, and providing advanced control options that allow users to customize OCR settings to improve recognition accuracy.
Its disadvantages include a steep learning curve, making it more suitable for advanced users; limited native support for Windows and macOS, with a primary focus on Linux users; and slower processing speeds when handling large, multi-page PDF files.
Notes:
- For low-resolution, skewed, or complex-layout scanned documents, you must manually select the text area to achieve optimal results.
- Windows/macOS users must compile the software themselves or use a third-party packaged version; the installation process is relatively cumbersome.
- Processing large, multi-page PDFs consumes significant system resources and may cause lag or response delays.
- After recognition is complete, the text must be manually proofread, particularly for errors in special characters, tables, and formulas.
This allows you to reliably extract text from scanned pdf with precise control over the recognition process. It is suitable for users who need advanced OCR customization and are comfortable with more technical tools, especially on Linux systems.
- YatesGriffinIron Contributor
Microsoft Word includes built-in OCR functionality that allows you to extract text from scanned pdffiles, enabling you to recognize text in scanned documents without the need for additional software.
How to Extract Text from Scanned PDF
- Open the software
- Click the File menu, select Open, and then choose the PDF file you want to scan
- Confirm the prompt displayed by the system
- The program will perform automatic OCR analysis on the file's content
- Edit the recognized text, or save the file in Word format
Once loaded, the application can run offline and successfully extract text from scanned PDFs. Please note that Microsoft Office must be installed; while Windows comes with a version that offers a one-month trial, it is not available for free.
Pros
- Built-in OCR functionality; no need to install additional tools
- Text extraction can be performed offline after loading
- Recognized text can be edited directly
Cons
- Requires Microsoft Office to be installed
- Recognition results are subpar for PDFs with complex layouts
- ElenorpCopper Contributor
Let me explain both situations so you know exactly how to extract text from PDF using Edge on your Windows machine.
Situation 1: Standard PDFs
If you open a PDF and can already highlight the text with your mouse cursor, you're looking at a standard text‑based PDF. In this case, extracting text is extremely straightforward:
1. Open the PDF in Microsoft Edge (it's the default PDF viewer on Windows)
2. Select the text by clicking and dragging your mouse over the content you want
3. Copy the text using either:
- Right‑click and select "Copy" from the menu
- The keyboard shortcut Ctrl + C
- Paste it anywhere with Ctrl + V
Edge even provides a convenient mini‑menu that pops up when you select text, giving you quick access to copy, highlight, or add comments. It's fast, intuitive, and requires no extra software.
Situation 2: Scanned PDFs
This is where things get interesting when you learning how to extract text from PDF — and where Edge's hidden superpower comes into play. Scanned PDFs are essentially images of pages, not actual text. Normally, you can't select or copy anything from them. However, Microsoft has been testing a feature that solves exactly this problem.
The Experimental OCR Feature
Microsoft Edge is currently testing an "OCR for PDF" feature that integrates Windows 11's built‑in OCR engine directly into the browser's PDF reader. Here's what you need to know:
How to enable it:
1. Type edge //flags into Edge's address bar and press Enter
2. Search for msPdfWindowsOcrCoverage
3. Change the setting from "Default" to "Enabled"
4. Restart Microsoft Edge
- HoltSawyeIron Contributor
gImageReader is an open-source, free tool that uses the Tesseract OCR engine, and it can extract text from scanned pdf offline without any internet connection.
It lets you recognize text in scanned documents with high accuracy and multiple language support.
First, download and install the software from the official website
Next, open the program, click File > Open, and select the PDF file you scanned.
Click the Recognize button, select the desired language, and wait for the OCR process to complete.
You can then copy the recognized text directly or export it as a TXT file.
This method is excellent for accurate OCR with many language options, so it works well for users who need to extract text from scanned pdf in different languages.
The software is open-source, so there are no hidden fees. If you don't use English, you'll need to download the installation package for another language.
If you're looking for a highly accurate offline OCR solution, this is a reliable choice, although its interface is more technical than that of applications designed for general users.
ps
- Poor scan clarity and low contrast can significantly reduce recognition accuracy; we recommend optimizing scan quality in advance.
- Processing large or multi-page PDF files may be slow and could result in program response delays.
- When exporting to a TXT file, the original document’s formatting will be lost, and only plain text content will be retained.
- ZoeiurBrass Contributor
if you are looking for a legitimate, safe, and completely free way to extract text from PDF free on Windows without installing sketchy software, Share X is an excellent choice. Just open your PDF, point, click, and paste.
Because Share X works by looking at your screen, the process is slightly different from a standard PDF converter. However, it is very straightforward. Here is the step-by-step to extract text from PDF free:
- Open your scanned PDF: First, use any PDF viewer to open the scanned document on your screen.
- Activate Share X's OCR: Instead of taking a regular screenshot, you will use Share X's tex recognition tool. You can find this by opening the Share X main window and navigating to the Tools menu, where you will see an option for Text Recognition .
- Select the text region: Your cursor will change, allowing you to click and drag a box directly over the text in the scanned PDF that you want to copy. This is very precise.
- Get your text: Instantly, Share X will process the image inside your selected box, recognize the letters and words, and automatically copy that text to your computer's clipboard. You can then simply paste it into any document, email, or text file.
Share X is a fantastic tool for this task, but understanding its small quirks will help you use it most effectively.
Excellent for Short or Medium Extracts: This method is perfect when you need to copy a few paragraphs, a recipe, a quote, or a technical command from a PDF. It is much faster than re-typing everything.
Not for Whole Book Conversion: It is not designed to automatically process all 300 pages of a scanned novel. The tool works best as an on-demand text grabber for the specific sections you select on your screen.
- RhysinBrass Contributor
The most direct and private method is to use the built-in OCR engine that comes free with Windows. You don't need to install any extra software to use this feature.
How it works: Windows has a native OCR engine called Windows, Media, Ocr that can extract text from images. It works entirely offline, meaning your documents never leave your computer.
The Tool: You can access this engine via Microsoft PowerToys, a free, open-source utility officially published by Microsoft for power users .
How to extract text from PDF:
1. Install PowerToys: Download and install Microsoft PowerToys.
2. Open your Scanned PDF: Use any PDF viewer (like Microsoft Edge or Adobe Reader) to open the scanned document on your screen.
3. Activate Text Extractor: Press the activation shortcut: Win + Shift + T . A transparent overlay will appear on your screen.
4. Select the Text: Click and drag your mouse to draw a box over the area of text you want to copy.
5. Paste the Text: The text is automatically copied to your clipboard. You can now paste it (Ctrl + V) into any document or text editor.
Start with Microsoft PowerToys if you want to know how to extract text from PDF. It is an official Microsoft tool, works entirely offline, and is perfectly suited for quickly extracting text from any scanned document you see on your screen.