Forum Discussion
Rahul1202
Feb 04, 2025Copper Contributor
Using AI to convert unstructured information to structured information
We have a use case to extract the information from various types of documents like Excel, PDF, and Word and convert it into structured information. The data exists in different formats.
We started building this use case with AI Builder, and we hit the roadblock and are now exploring ways using the Co-pilot studio.
It would be great if someone could point us in the right direction.
What should be the right technology stack that we should consider for this use case?
Thank you for the pointer.
- jenapravatCopper Contributor
Extracting insights from documents is key to analytical as well as research (digital) products. Information extraction can be broadly classified as 2 different tasks: i. text extraction from document ii. entity extraction from the free text.
For extracting insights from PDFs, Images, etc, we can use Azure AI Document Intelligence service as its powerful in extracting all the text (OCR capability) from PDFs, Images, etc. as well as extracting specific entities as per business need. However, for only extracting entities from free text, Named Entity Recognition under Azure AI > Language service will be very helpful.
Based on the business use case, we can decide which service will better fit.
- Rahul1202Copper Contributor
Thank you for the pointer, Jenapravat. As I write, from the experiment we did, we observed that the Azure AI document service is yielding a great result. However, our initial observation says that it requires manual mapping for each different format to extract the text with accuracy. Ideally, we are looking for a model that can recognize the similar terms from the various document types and start extracting the accurate value without manual mapping. For example, a diameter in one document could be named as the final diameter in another document. In other words, a model should be able to train itself. Is this something you have experience working with that and can guide us?
- JamespaulG-0359Copper Contributor
Have you tried creating a custom Named Entity Recognition model from Azure AI Language service?
- JamespaulG-0359Copper Contributor
You could apply 'classical' NLP techniques like Entity recognition. In Power Automate you have AI Builder that comes with out of the box model. You also have an option to train your own custom model. If you could show some examples, I would be happy to guide you. I have experience with similar situations, in extracting structured data from free format unstructured text in invoice descriptions.