Forum Discussion

Rahul1202's avatar
Rahul1202
Copper Contributor
Feb 04, 2025

Using AI to convert unstructured information to structured information

We have a use case to extract the information from various types of documents like Excel, PDF, and Word and convert it into structured information. The data exists in different formats. 

We started building this use case with AI Builder, and we hit the roadblock and are now exploring ways using the Co-pilot studio. 
It would be great if someone could point us in the right direction.
What should be the right technology stack that we should consider for this use case? 
Thank you for the pointer.

  • jenapravat's avatar
    jenapravat
    Copper Contributor

    Extracting insights from documents is key to analytical as well as research (digital) products. Information extraction can be broadly classified as 2 different tasks: i. text extraction from document ii. entity extraction from the free text.

    For extracting insights from PDFs, Images, etc, we can use Azure AI Document Intelligence service as its powerful in extracting all the text (OCR capability) from PDFs, Images, etc. as well as extracting specific entities as per business need. However, for only extracting entities from free text, Named Entity Recognition under Azure AI > Language service will be very helpful. 

    Based on the business use case, we can decide which service will better fit.

    • Rahul1202's avatar
      Rahul1202
      Copper Contributor

      Thank you for the pointer, Jenapravat. As I write, from the experiment we did, we observed that the Azure AI document service is yielding a great result. However, our initial observation says that it requires manual mapping for each different format to extract the text with accuracy. Ideally, we are looking for a model that can recognize the similar terms from the various document types and start extracting the accurate value without manual mapping. For example, a diameter in one document could be named as the final diameter in another document. In other words, a model should be able to train itself. Is this something you have experience working with that and can guide us?

      • JamespaulG-0359's avatar
        JamespaulG-0359
        Copper Contributor

        Have you tried creating a custom Named Entity Recognition model from Azure AI Language service?

  • JamespaulG-0359's avatar
    JamespaulG-0359
    Copper Contributor

    You could apply 'classical' NLP techniques like Entity recognition. In Power Automate you have AI Builder that comes with out of the box model. You also have an option to train your own custom model.  If you could show some examples, I would be happy to guide you. I have experience with similar situations, in extracting structured data from free format unstructured text in invoice descriptions.

Resources