Extracting Documents with Mixed Handwritten and Typed Text Using Azure AI Foundry's Document Field Extraction (Preview) Feature
In document processing, dealing with documents that contain a mix of handwritten and typed text presents a unique challenge. Often, these documents also feature handwritten corrections where certain sections are crossed out and replaced with corrected text. Ensuring that the final extracted content accurately reflects these corrections is crucial for maintaining data accuracy and usability. In our recent endeavors, we explored various tools to tackle this issue, with a particular focus on Document Intelligence Studio and Azure AI Foundry's new Field Extraction Preview feature.
The Challenge
Documents with mixed content types—handwritten and typed—can be particularly troublesome for traditional OCR (Optical Character Recognition) systems. These systems often struggle with recognizing handwritten text accurately, especially when it coexists with typed text. Additionally, when handwritten corrections are involved, distinguishing between crossed-out text and the corrected text adds another layer of complexity, as the model is confused with which value(s) to pick out.
Our Approach
Initial Experiments with Pre-built Models
To address this challenge, we initially turned to Document Intelligence Studio's pre-built invoice model, which provided a solid starting point. However, it would often extract both the crossed-out value as well as the new handwritten value under the same field. In addition, it did not always match the correct key to field value.
Custom Neural Model Training
Next, we attempted to train a custom neural model in the Document Intelligence Studio, which leverages Deep Learning for predicting key document elements, allowing for further adjustments and refinements. It is recommended to use at least 100 to 1000 sample files to achieve more accurate and consistent results. When training models, it is crucial to use text-based PDFs (PDFs with selectable text) as they provide better data for training. The model's accuracy improves with more varied training data, including different types of handwritten edits. Without enough training data or variance, the model may overgeneralize. Therefore, we uploaded approximately 100 text-based pdfs's (PDF has selectable text) to Azure AI Foundry and manually corrected the column containing handwritten text. After training on a subset of these files, we built and tested our custom neural model on the training data. The model performed impressively, achieving a 92% confidence score in identifying the correct values. The main drawbacks were the manual effort required for data labeling and the 30 minutes needed to build the model.
During our experiments, we noticed that when extracting fields from a table, labeling and extracting every column comprehensively rather than just a few columns resulted in higher accuracy. The model was better at predicting when it had a complete view of the table
Breakthrough with Document Field Extraction (Preview)
Finally, the breakthrough came when we leveraged the new Document Field Extraction Preview feature from Azure AI Foundry. This feature demonstrated significant improvements in handling mixed content and provided a more seamless experience in extracting the necessary information.
Field Description Modification: One of the key steps in our process was modifying the field descriptions within the Field Extraction Preview feature. By providing detailed descriptions of the fields we wanted to extract, we helped the AI understand the context and nuances of our documents better. Specifically, we wanted to make sure that the value extracted for FOB_COST was the handwritten correction, so we wrote in the Field Description: "Ignore strikethrough or 'x'-ed out text at all costs, for example: do not extract red / black pen or marks through text. Do not use stray marks. This field only has numbers."
Correction Handling: During the extraction process, the AI was able to distinguish between crossed-out text and the handwritten corrections. Whenever a correction was detected, the AI prioritized the corrected text over the crossed-out content, ensuring that the final extracted data was accurate and up-to-date.
Performance Evaluation: After configuring the settings and field descriptions, we ran several tests to evaluate the performance of the extraction process. The results were impressive, with the AI accurately extracting the corrected text and ignoring the crossed-out sections. This significantly reduced the need for manual post-processing and corrections
Results
The new Field Extraction Preview feature in Azure AI Foundry exceeded our expectations. The modifications we made to the field descriptions, coupled with the AI's advanced capabilities, resulted in a highly efficient and accurate document extraction process. The AI's ability to handle mixed-content documents and prioritize handwritten corrections over crossed-out text has been a game-changer for our workflow.
Conclusion
For anyone dealing with documents that contain a mix of handwritten and typed text, and where handwritten corrections are present, we highly recommend exploring Azure AI Studio's Field Extraction Preview feature. The improvements in accuracy and efficiency can save significant time and effort, ensuring that your extracted data is both reliable and usable. As we continue to refine our processes, we look forward to even more advancements in document intelligence technologies.