Forum Discussion
Rahul1202
Feb 04, 2025Copper Contributor
Using AI to convert unstructured information to structured information
We have a use case to extract the information from various types of documents like Excel, PDF, and Word and convert it into structured information. The data exists in different formats. We started ...
Abdulrhman
Mar 31, 2025Copper Contributor
Hi Rahul
I think you're essentially looking for a model that can understand the sementic meaning of the data, not just the literal text and its position on the page.
You Can fine-tune an LLM ( like those in Azure OpenAI Service, or open-source modules like BERT) also You could train the LLM on a dataset of documents where you've manually labeled the "diameter" field, even when it's expressed differently. The LLM would then learn to identify the "diameter" field in new, unseen documents, even if the wording is slightly different.
Consider a knowledge Graph if you have a complex domain with many related concepts, a knowledge graph can be very helpful.
hope this help