Forum Discussion

Copper Contributor

Oct 12, 2019

SharePoint Capability to do OCR in PDF Documents

Hi. We have a requirement where all documents (PDF, Word, etc) with embedded images that are uploaded to SharePoint must be searchable. The text in the images must be searchable. I did find ...

OCR

SharePoint DMS

FedericoPorceddu82

MVP

Dec 03, 2019

Hi Rizwan Ansari ,

SharePoint extracts content from pdf, images as text, so you can find using OOB Search. Btw you can't customize this behavior, you need to use as it is.

if you need to customize your OCR experience, without using a 3P tools, you can think about a solution like this one I described in my blog, using SharePoint, flow and Azure Cognitive Services.

Cheers,

Federico

Jason E. Heiser

Iron Contributor

Oct 16, 2020

What I'm reading from this diagram, though, is that the actual OCR for PDFs can only be accomplished by running the item through Power Automate and processing either with Cognitive Services or some other OCR engine like Muhimbi or Aquaforest.

Am I correct? What about the run limitations in PowerAutomate? A user could potentially upload thousands of PDFs in a week, and I'd hate to hit the run limit...

FedericoPorceddu82
MVP
Nov 09, 2020
Hi Jason E. Heiser
Flow by Power Automate is a way to build personal flow, so your statement is correct 🙂
When designing the solution, you can consider using dedicated flows with a "per-flow" license or a Logic App on Azure.
In this example I wanted to highlight the power of the low code solution - no code, but for personal use, not enterprise.
Thanks for your comment 🙂
Cheers
Federico