Blog Post

Microsoft Foundry Blog

5 MIN READ

Generate searchable PDFs with Azure Form Recognizer

Microsoft

Oct 17, 2022

Important update: Azure Document Intelligence (formerly Form Recognizer) now supports generation of the searchable PDFs starting from 2024-11-30 API (4.0 GA). Please read: Searchable PDF - Azure Docu...

Updated Jan 30, 2025

Version 9.0

azure ai document intelligence

azure ai services

anatolip

Microsoft

Joined March 03, 2021

View Profile

Microsoft Foundry Blog

Follow this blog board to get notified when there's new activity

anatolip

Microsoft

Jan 04, 2024

nickstiv , I agree that fixed DPI/color space/compression may not be optimal solution for some use cases. As you pointed out, it is possible to detect some of this information from PDF file content or alternative solution is to add text layer on top of existing PDF by creating new PDF page with text and merge existing and "invisible text" pages. Such approach is slightly less generic vs rendering approach in this blog post. It may cause duplication of text inside PDF, if original PDF already had some digital text in it. Also it is slightly less reliable since "merge" PDF content sometimes causes issues especially for rotated pages (depending of method/packages used). But if your PDFs are not very complex and completely image-based, such approach will allow to maintain original size of the PDF + some delta for text content.