Forum Discussion

EricStarker's avatar
EricStarker
Icon for Community Manager rankCommunity Manager
Oct 21, 2022

Generate searchable PDFs with Azure Form Recognizer

PDF documents are widely used in business processes. Digitally created PDFs are very convenient to use. Text can be searched, highlighted, and annotated.

 

Unfortunately, a lot of PDFs are created by scanning or converting images to PDFs. There is no digital text in these PDFs, so they cannot be searched.

 

The blog post linked below demonstrates how to convert such PDFs into searchable PDFs with a simple and easy to use code and Azure Form Recognizer. The code will generate a searchable PDF file that will allow you to store the document anywhere, search within the document and copy and paste. 

 

https://techcommunity.microsoft.com/t5/ai-applied-ai-blog/generate-searchable-pdfs-with-azure-form-recognizer/ba-p/3652024

 

Do you have any use cases in mind for a searchable PDF? How has Form Recognizer helped you with your documents?

 

  • isspid's avatar
    isspid
    Copper Contributor
    This is a very useful usecase. The blog post is a very interesting solution, with the problem that the resulting PDF is much larger in size than the original one. Considering a use case (my experience) where you are dealing with a very large number of PDFs the size can become a very serious problem.

    OCRmyPDF (https://github.com/ocrmypdf/OCRmyPDF) do an amazing job of perserving the original size of the PDF, but at the moment they do not support Azure Form Recognizer as an egine.

Resources