How can I extract text from an image on my computer?

Iron Contributor

Sep 09, 2025

pytesseract is a Python wrapper for Tesseract OCR (Optical Character Recognition) engine. Tesseract itself is an open-source tool developed by Google that can read text from images and convert it into editable text. Since Tesseract is written in C++, it’s not directly “Python-friendly.” That’s where pytesseract comes in—it acts like a bridge, letting you run Tesseract commands inside Python scripts.

With pytesseract, you can:

Extract text from images (JPG, PNG, etc.).
Process scanned PDFs by converting pages to images first.
Batch-process multiple files with just a few lines of code.
Integrate OCR into bigger Python projects like data scraping, automation, or machine learning.

Sample code sniplet to extract text from images on Windows or Mac:

import os
from PIL import Image
import pytesseract

folder = "images"

for file in os.listdir(folder):
    if file.endswith((".png", ".jpg", ".jpeg")):
        img_path = os.path.join(folder, file)
        text = pytesseract.image_to_string(Image.open(img_path))
        print(f"\n--- {file} ---\n{text}")

Forum Discussion

How can I extract text from an image on my computer?

Resources