Forum Discussion
Using AI to convert unstructured information to structured information
Extracting insights from documents is key to analytical as well as research (digital) products. Information extraction can be broadly classified as 2 different tasks: i. text extraction from document ii. entity extraction from the free text.
For extracting insights from PDFs, Images, etc, we can use Azure AI Document Intelligence service as its powerful in extracting all the text (OCR capability) from PDFs, Images, etc. as well as extracting specific entities as per business need. However, for only extracting entities from free text, Named Entity Recognition under Azure AI > Language service will be very helpful.
Based on the business use case, we can decide which service will better fit.
Thank you for the pointer, Jenapravat. As I write, from the experiment we did, we observed that the Azure AI document service is yielding a great result. However, our initial observation says that it requires manual mapping for each different format to extract the text with accuracy. Ideally, we are looking for a model that can recognize the similar terms from the various document types and start extracting the accurate value without manual mapping. For example, a diameter in one document could be named as the final diameter in another document. In other words, a model should be able to train itself. Is this something you have experience working with that and can guide us?
- ml4uApr 18, 2025Brass Contributor
To address the challenge of extracting accurate information without manual mapping, consider using a combination of pre-trained models and custom fine-tuning. Pre-trained models can provide a good starting point, and fine-tuning them with your specific data can improve accuracy. Additionally, exploring techniques like transfer learning and embedding models can help the model learn semantic relationships between terms across different document types. This approach can reduce the need for manual mapping and improve the model's ability to generalize across various formats.
- Feb 13, 2025
Have you tried creating a custom Named Entity Recognition model from Azure AI Language service?