Forum Discussion

Copper Contributor

Feb 04, 2025

Using AI to convert unstructured information to structured information

We have a use case to extract the information from various types of documents like Excel, PDF, and Word and convert it into structured information. The data exists in different formats. We started ...

Rahul1202

Copper Contributor

Feb 13, 2025

Thank you for the pointer, Jenapravat. As I write, from the experiment we did, we observed that the Azure AI document service is yielding a great result. However, our initial observation says that it requires manual mapping for each different format to extract the text with accuracy. Ideally, we are looking for a model that can recognize the similar terms from the various document types and start extracting the accurate value without manual mapping. For example, a diameter in one document could be named as the final diameter in another document. In other words, a model should be able to train itself. Is this something you have experience working with that and can guide us?

ml4u

Brass Contributor

Apr 18, 2025

To address the challenge of extracting accurate information without manual mapping, consider using a combination of pre-trained models and custom fine-tuning. Pre-trained models can provide a good starting point, and fine-tuning them with your specific data can improve accuracy. Additionally, exploring techniques like transfer learning and embedding models can help the model learn semantic relationships between terms across different document types. This approach can reduce the need for manual mapping and improve the model's ability to generalize across various formats.