Solution for OCR Forms Processing and Entity Extraction

Brass Contributor

Need to propose a solution that will ingest scanned copies of Forms and extract entities and fields.

For e..g extract Invoice Number, Date, Customer Name, Product Names, Raw Material Names etc from a Invoice form.

 

There are many 3rd party companies which offer the ability to digitize your data. I was wondering what will be good Solution Architecture if we were to implement using Azure?

 

Should we consider Azure Computer Vision, Custom Vision, Entity Linking Intelligence Service API, Named Entity Extraction, LUIS, ML, etc? Any suggestions on how we can approach?

3 Replies

Hi,

Any outcomes on your research?

 

Thanks in advance!

Frank

@Vinay Bhatia I was wondering if you found anything on this issue. I also have a similar goal and am trying to use Computer Vision API from cognitive services.

 

Thank You.

@Deleted It's been more than a year since we created the solution. From what I can recollect,
we used an image classification to find whether scanned image is of type Form 1 or Form 2.
We then used Azure Computer Vision API to extract text within the image.
And then, we used a combination of LUIS and some RegEx String manipulation to extract Field Values.