With Cha Zhang, Yi Zhou, Wei Zhang and links to research papers by Qiang Huo and colleagues.
Microsoft Read OCR technology, now in its third publicly available (GA) release is available as a cloud service and Docker container as part of Microsoft Cognitive Services’ Computer Vision API. Starting with version 3.0, Form Recognizer adds the Read OCR model to its document intelligence product line. Customers no longer need to choose between two cloud services when deciding whether to use OCR or the higher-end document intelligence capabilities. They now have access to the full range of document processing capabilities within a unified API and SDK experience.
Read OCR in Form Recognizer represents the laser focus on advanced document scenarios for the next wave of OCR improvements. In this article, we will do a brief review of OCR challenges and how Read solves them today, before covering the new features and AI quality improvements in Form Recognizer 3.0.
Building generic OCR technology that recognizes and extracts text with very high accuracy regardless of the content format and language comes with many challenges:
|
|
Read uses multiple deep learning models and algorithms to detect and extract text in hundreds of languages while making sense of the variables listed in the previous section. The following illustration shows the models and the flow at a high level.
The first step is to identify the presence of text lines, their curvature, orientation, and then group those together to form text regions. Text region detection is a specialized case of visual object detection and therefore, this process is based on the popular Faster-R CNN object detection model enhanced with an implementation based on the research paper, Anchor-free and Scale-friendly Region Proposal Network (AF-RPN).
Once the text regions are detected, they are grouped together with techniques based on “A Relation Network Based Approach to Curved Text Detection” paper.
Once get detected, they are extracted with an integrated convolutional neural network (CNN) and deep bidirectional long short-term memory (DBLSTM) model in combination with traditional statistical models like weighted finite state transducers (WFST). The relevant research references are the papers - “Compact and Efficient WFST-based Decoders for Handwriting Recognition,” and “A Compact CNN-DBLSTM Based Character Model For Offline Handwriting Recognition with Tucker Decomposi...” The input to this decoder includes a lexicon, language models and universal script-based character models for the supported languages.
The goal of any OCR technology today is to rapidly scale out to supporting global languages with every release. Read OCR supports hundreds of languages today. To do so efficiently, instead of building individual language models, the following illustration shows a different approach.
The text lines from the detector are input to script-based models. These models include script-based character models, language models, and rejection models. Each script-based model results in supporting all languages that use that script. In fact, the OCR service has no knowledge of the specific languages present in the image.
Read as the foundational OCR model now supports 164 languages in Form Recognizer 3.0 GA. Form Recognizer’s Layout and Custom template model capabilities also support the same languages. The major additions are Cyrillic, Arabic, and Devnagari scripts and supporting languages.
The following screenshot from the Form Recognizer Studio shows an example of OCR for Russian text.
The following screenshot from the Form Recognizer Studio shows an example of OCR for Arabic text.
The following screenshot from the Form Recognizer Studio shows an example of OCR for Hindi text.
Form Recognizer Read supports OCR for handwritten text in Chinese Simplified, French, German, Italian, Japanese, Korean, Portuguese, and Spanish. By implication, Layout and Custom Forms also support handwritten text in these languages.
Typically, forms contain both printed and handwritten style text on the same page. The following examples show the previously skipped handwritten text now correctly extracted in Form Recognizer v3.0.
Form Recognizer v2.1 (2021) |
Form Recognizer v3.0 (2022) |
|
|
The Read model extracts all identified blocks of text as part of the paragraphs collection. Each entry in this collection groups individual text lines together along with the bounding polygon coordinates of the identified text block. The span information points to the text fragment within the top-level content property that contains the full text from the document.
"paragraphs": [
{
"spans": [],
"boundingRegions": [],
"content": "While healthcare is still in the early stages of its Al journey, we are seeing pharmaceutical and other life sciences organizations making major investments in Al and related technologies.\" TOM LAWRY | National Director for Al, Health and Life Sciences | Microsoft"
}
]
Read adds language detection as a new feature for text lines. Read will predict the primary detected language for each text line along with the confidence score.
"languages": [
{
"spans": [
{
"offset": 0,
"length": 131
}
],
"locale": "en",
"confidence": 0.7
},
]
Use the api-version=2022-06-30-preview when using the REST API or the corresponding SDKs for that API version to preview the support for Microsoft Word, Excel, PowerPoint, and HTML files. The service extracts text including from any embedded images and output all text in the output.
Dates are important business data that are critical for automatic business workflows based on OCR results. The following examples show the improvement in date extraction in the new release.
Form Recognizer v2.1 (2021) |
Form Recognizer v3.0 (2022) |
|
|
Forms commonly have character boxes with single characters that make it easier for humans but harder for machines to extract reliably. The following examples show the improvement in boxed character extraction in the new release.
Form Recognizer v2.1 (2021) |
Form Recognizer v3.0 (2022) |
|
|
Checks with MICR text that contains important account information are the backbone of any financial workflow system. The following examples show the improvement in MICR text extraction in the new release.
Form Recognizer v2.1 (2021) |
Form Recognizer v3.0 (2022) |
|
|
The Covid pandemic and the related economic and labor situation has forced many human loop processes into automated data capture and processing models. The following examples show the improvement in LED style text extraction in the new release.
Form Recognizer v2.1 (2021) |
Form Recognizer v3.0 (2022) |
|
|
Start with the new Read model in Form Recognizer with the following options:
1. Try it in Form Recognizer Studio by creating a Form Recognizer resource in Azure and trying it out on the sample document or on your own documents.
2. Refer to the OCR SDK QuickStart for complete code samples in .NET, Python, JavaScript, and Java.
# sample form document
formUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/read.png"
# create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-read", formUrl)
result = poller.result()
print ("Document contains content: ", result.content)
3. Follow the OCR REST API QuickStart. All it takes is two operations to extract the text.
curl -v -i POST "{endpoint}/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2022-08-31" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: {key}" --data-ascii "{'urlSource': 'https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/read.png'}"
.....
curl -v -X GET "{endpoint}/formrecognizer/documentModels/prebuilt-read/analyzeResults/{resultId}?api-version=2022-08-31" -H "Ocp-Apim-Subscription-Key: {key}"
Fujitsu is the world leader in document scanning technology, with more than 50 percent of global market share, but that doesn't stop the company from constantly innovating. To improve the performance and accuracy of its cloud scanning solution, Fujitsu incorporated Azure Form Recognizer’s OCR technology. It took only a few months to deploy the new technologies, and they have boosted character recognition rates as high as 99.9 percent. This collaboration helps Fujitsu deliver market-leading innovation and give its customers powerful and flexible tools for end-to-end document management.
Learn more about the Fujitsu’s OCR story and other Form Recognizer customer successes.
The Form Recognizer v3.0 announcement article covers all new capabilities and enhancements. Be sure to check it out. Refer to the following resources to learn more and get started.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.