Generate searchable PDFs with Azure Form Recognizer

Question

PDF documents are widely used in business processes. Digitally created PDFs are very convenient to use. Text can be searched, highlighted, and annotated. 
&nbsp;
Unfortunately, a lot of PDFs are created by scanning or converting images to PDFs. There is no digital text in these PDFs, so they cannot be searched. 
&nbsp;
The blog post linked below demonstrates how to convert such PDFs into searchable PDFs with a simple and easy to use code and Azure Form Recognizer. The code will generate a searchable PDF file that will allow you to store the document anywhere, search within the document and copy and paste.&nbsp;
&nbsp;
https://techcommunity.microsoft.com/t5/ai-applied-ai-blog/generate-searchable-pdfs-with-azure-form-recognizer/ba-p/3652024
&nbsp;
Do you have any use cases in mind for a searchable PDF? How has Form Recognizer helped you with your documents?
&nbsp;

isspid · Answer

This is a very useful usecase. The blog post is a very interesting solution, with the problem that the resulting PDF is much larger in size than the original one. Considering a use case (my experience) where you are dealing with a very large number of PDFs the size can become a very serious problem.

OCRmyPDF (https://github.com/ocrmypdf/OCRmyPDF) do an amazing job of perserving the original size of the PDF, but at the moment they do not support Azure Form Recognizer as an egine.

Forum Discussion

Generate searchable PDFs with Azure Form Recognizer

1 Reply

Resources