Forum Discussion
Using AI to convert unstructured information to structured information
We have a use case to extract the information from various types of documents like Excel, PDF, and Word and convert it into structured information. The data exists in different formats.
We started building this use case with AI Builder, and we hit the roadblock and are now exploring ways using the Co-pilot studio.
It would be great if someone could point us in the right direction.
What should be the right technology stack that we should consider for this use case?
Thank you for the pointer.
18 Replies
- BlueeyesWhite_Copper Contributor
Well yes, but I think it best consider that AI generative data be more used solely for customary changes. Also the person refrain from a comment on a different forum "as there crutch". Like I still do lots writing on physical notebook or sketchbooks. Up to individual on what floats they're boat.
The structured pipelines that store and compile data like SQL, SEO, or any of those can do a lot it always seems it's not enough. Goes along with lot of censorship that I don't find very pleasing or auctually in some derogatory. That's why it's important to convey a discernable truth.
- AIChief_Copper Contributor
This is a very relevant and powerful use case — and you're definitely on the right track by exploring Co-Pilot Studio. From what we’ve seen at AIChief, converting unstructured data into structured formats requires a hybrid approach of AI + traditional data processing pipelines.
Here’s a recommended tech stack for your use case:
Azure Form Recognizer / Document Intelligence – Excellent for extracting key-value pairs, tables, and layout data from PDFs, Word, and images.
Power Automate + AI Builder – Good for automating workflows, but can be limiting for complex document types. Can still be used to trigger processes post-extraction.
Azure OpenAI or Azure Cognitive Services (via LangChain) – Use GPT-powered models to extract or infer structured data from semi-structured formats, especially where templates vary.
Dataverse / SQL / Cosmos DB – For storing the extracted structured data and enabling analytics or visualization downstream.
If you're looking to scale this for enterprise use, consider layering a custom GPT or Co-Pilot model trained on your specific document formats.
We’ve published tools and curated insights at AIChief.com to help AI builders and teams working with similar automation use cases. Happy to connect or help further if needed!
- AIChief_Copper Contributor
This is a very relevant and powerful use case — and you're definitely on the right track by exploring Co-Pilot Studio. From what we’ve seen at AIChief, converting unstructured data into structured formats requires a hybrid approach of AI + traditional data processing pipelines.
Here’s a recommended tech stack for your use case:
✅ Azure Form Recognizer / Document Intelligence – Excellent for extracting key-value pairs, tables, and layout data from PDFs, Word, and images.
✅ Power Automate + AI Builder – Good for automating workflows, but can be limiting for complex document types. Can still be used to trigger processes post-extraction.
✅ Azure OpenAI or Azure Cognitive Services (via LangChain) – Use GPT-powered models to extract or infer structured data from semi-structured formats, especially where templates vary.
✅ Dataverse / SQL / Cosmos DB – For storing the extracted structured data and enabling analytics or visualization downstream.
If you're looking to scale this for enterprise use, consider layering a custom GPT or Co-Pilot model trained on your specific document formats.
We’ve published tools and curated insights at https://aichief.com to help AI builders and teams working with similar automation use cases. Happy to connect or help further if needed!
- ml4uBrass Contributor
For converting unstructured information to structured data, you can use a combination of tools and technologies. AI Builder is a good starting point for extracting data from documents. If you encounter limitations, consider using Azure Form Recognizer for more advanced extraction capabilities. Additionally, Power Automate can help automate the process, and Azure Cognitive Services can provide enhanced processing capabilities. Storing and managing the structured data in Dataverse or Azure Synapse can also be beneficial. This combination of tools can help streamline the process and improve accuracy.
- sharjeelasgharCopper Contributor
For extracting and structuring data from Excel, PDF, and Word, You should use these platforms ike Azure Form Recognizer, Power Automate, and Copilot Studio for automation. If AI Builder don't work, use Azure Cognitive Services or Python (Pandas, PyPDF2, OpenPyXL) for better control. Storing data in Dataverse or Synapse can help you in structuring data.
Learn more about https://porcentagemcalculadora.com/ and technology here.
- ml4uBrass Contributor
For converting unstructured information to structured information, Azure Form Recognizer is a great tool for extracting data from documents. You can also use Power Automate for automation and explore Azure Cognitive Services for additional capabilities. These tools can help streamline the process and improve accuracy.
- ml4uBrass Contributor
Converting unstructured information to structured data is a common challenge. AI Builder is a good starting point, but if you're exploring other options, consider using Azure Cognitive Services for document processing. The Form Recognizer service can extract text, key-value pairs, and tables from documents. For more complex scenarios, you might explore custom machine learning models using Azure Machine Learning or leveraging pre-trained models in the Azure OpenAI Service. Ensure you preprocess the data appropriately and consider using a combination of techniques for optimal results.
- AbdulrhmanCopper Contributor
Hi Rahul
I think you're essentially looking for a model that can understand the sementic meaning of the data, not just the literal text and its position on the page.
You Can fine-tune an LLM ( like those in Azure OpenAI Service, or open-source modules like BERT) also You could train the LLM on a dataset of documents where you've manually labeled the "diameter" field, even when it's expressed differently. The LLM would then learn to identify the "diameter" field in new, unseen documents, even if the wording is slightly different.
Consider a knowledge Graph if you have a complex domain with many related concepts, a knowledge graph can be very helpful.hope this help
- warshafCopper Contributor
For extracting structured data from Excel, PDF, and Word, consider Azure Form Recognizer, Power Automate, and Copilot Studio for automation. If AI Builder falls short, use Azure Cognitive Services or Python (Pandas, PyPDF2, OpenPyXL) for better control. Storing data in Dataverse or Synapse can help with structuring.
Learn more about https://thefifamobile.com/ here! 🚀
- jenapravatCopper Contributor
Extracting insights from documents is key to analytical as well as research (digital) products. Information extraction can be broadly classified as 2 different tasks: i. text extraction from document ii. entity extraction from the free text.
For extracting insights from PDFs, Images, etc, we can use Azure AI Document Intelligence service as its powerful in extracting all the text (OCR capability) from PDFs, Images, etc. as well as extracting specific entities as per business need. However, for only extracting entities from free text, Named Entity Recognition under Azure AI > Language service will be very helpful.
Based on the business use case, we can decide which service will better fit.
- Rahul1202Copper Contributor
Thank you for the pointer, Jenapravat. As I write, from the experiment we did, we observed that the Azure AI document service is yielding a great result. However, our initial observation says that it requires manual mapping for each different format to extract the text with accuracy. Ideally, we are looking for a model that can recognize the similar terms from the various document types and start extracting the accurate value without manual mapping. For example, a diameter in one document could be named as the final diameter in another document. In other words, a model should be able to train itself. Is this something you have experience working with that and can guide us?
- ml4uBrass Contributor
To address the challenge of extracting accurate information without manual mapping, consider using a combination of pre-trained models and custom fine-tuning. Pre-trained models can provide a good starting point, and fine-tuning them with your specific data can improve accuracy. Additionally, exploring techniques like transfer learning and embedding models can help the model learn semantic relationships between terms across different document types. This approach can reduce the need for manual mapping and improve the model's ability to generalize across various formats.