Forum Discussion
JoyceBeatty
Sep 09, 2025Iron Contributor
How can I extract text from an image on my computer?
Hi everyone, I have some images that contain important text, and I plan to extract the text into an editable format (like Word or plain text). I know this is usually done with OCR, but I'm not sure ...
Jedidiahin
Sep 09, 2025Iron Contributor
pytesseract is a Python wrapper for Tesseract OCR (Optical Character Recognition) engine. Tesseract itself is an open-source tool developed by Google that can read text from images and convert it into editable text. Since Tesseract is written in C++, it’s not directly “Python-friendly.” That’s where pytesseract comes in—it acts like a bridge, letting you run Tesseract commands inside Python scripts.
With pytesseract, you can:
- Extract text from images (JPG, PNG, etc.).
- Process scanned PDFs by converting pages to images first.
- Batch-process multiple files with just a few lines of code.
- Integrate OCR into bigger Python projects like data scraping, automation, or machine learning.
Sample code sniplet to extract text from images on Windows or Mac:
import os
from PIL import Image
import pytesseract
folder = "images"
for file in os.listdir(folder):
if file.endswith((".png", ".jpg", ".jpeg")):
img_path = os.path.join(folder, file)
text = pytesseract.image_to_string(Image.open(img_path))
print(f"\n--- {file} ---\n{text}")