Making PDF scanned as images searchable in Sharepoint using Adobe PDF Services

Copper Contributor

Hi all,

 

I'm trying to convert pdf files of scanned images into searchable pdf's using Adobe OCR. I came across a webpage "Make PDFs Searchable (OCR) After Importing into SharePoint" the adobe dot com blog developer page by a Ben Vanderberg which gives step by step guide how to create a flow using power automate and Adobe PDF Services however i've encountered 2 problems.

  • Firstly a condition in the flow only allows files that are anything but pdf's
  • "Convert document to PDF" stage gives me an error "The provided input content is not valid: 'The multipart content body cannot be null.'."

Has anyone been successful with this approach or found an alternative solution?

 

Thanks

 

Kamil

3 Replies

Hi Kamil,

The template flow you are using has a condition that will check if a file is not a PDF, and it will convert this file to PDF before the OCR step. In the case that it is already a PDF, this conversion step should be skipped and the PDF file will be used in the "Create Searchable PDF using OCR" step directly. The logic should work for any file type. You can delete the condition if you are sure that only PDFs will be received.

The error message could be a bad file, or an incorrect input somewhere in the flow. Screenshots of the flow could help debug this further. 

If you are still having issues, I can recommend https://www.aquaforest.com/resources/aquaforest-flow. We have a PDF OCR step, a free trial subscription, and a very active support desk that can help you get set up and answer any questions you have.

Kind Regards,

Alex
PSPDFKit

@Alex_Donhou Thanks for your response and apologies for the late reply. Your Aquaforest OCR sounds interesting. Should I just sign up or are you able to walk me through the process?

@KSoree 

 

If you have bulk OCR requirements in SharePoint, solutions in Power Automate are often not cost effective. 

 

Encodian Indxr provides low fixed cost unlimited OCR for bulk requirements. Itcan have automated run schedules to achieve automated bulk OCR at a fixed price.

It comes with a free to use audit tool to determine how many files are missing text layers (even on a page level basis) and a supportive help desk if you need any information.