Read OCR by Form Recognizer targets advanced document text extraction scenarios
Published Sep 19 2022 07:00 AM 6,927 Views
Microsoft

With Cha Zhang, Yi Zhou, Wei Zhang and links to research papers by Qiang Huo and colleagues.

 

Microsoft Read OCR technology, now in its third publicly available (GA) release is available as a cloud service and Docker container as part of Microsoft Cognitive Services’ Computer Vision API. Starting with version 3.0, Form Recognizer adds the Read OCR model to its document intelligence product line. Customers no longer need to choose between two cloud services when deciding whether to use OCR or the higher-end document intelligence capabilities. They now have access to the full range of document processing capabilities within a unified API and SDK experience.

Form Recognizer block diagramForm Recognizer block diagram

 

Read OCR in Form Recognizer represents the laser focus on advanced document scenarios for the next wave of OCR improvements. In this article, we will do a brief review of OCR challenges and how Read solves them today, before covering the new features and AI quality improvements in Form Recognizer 3.0.

OCR challenges

Building generic OCR technology that recognizes and extracts text with very high accuracy regardless of the content format and language comes with many challenges:

 

  • Large variability in terms of sizes, quality, resolution, orientations
  • Wide range of aspect ratios
  • Skewed/curved text lines for example posters, banners, and street signs
  • Adjacent small-sized text, for example, inter-line space could be less than 2 pixels
  • Complex/ambiguous layout, for example symbols, text, diagrams
  • Text-like background for example, fences, bricks, and stripes
  • Global languages now 164+ with more to come
OCR challenging images - examplesOCR challenging images - examples

 

Read OCR uses multiple deep learning models

Read uses multiple deep learning models and algorithms to detect and extract text in hundreds of languages while making sense of the variables listed in the previous section. The following illustration shows the models and the flow at a high level.

 

OCR models overviewOCR models overview

 

 

Text detection models

The first step is to identify the presence of text lines, their curvature, orientation, and then group those together to form text regions. Text region detection is a specialized case of visual object detection and therefore, this process is based on the popular Faster-R CNN object detection model enhanced with an implementation based on the research paper, Anchor-free and Scale-friendly Region Proposal Network (AF-RPN).

 

Once the text regions are detected, they are grouped together with techniques based on “A Relation Network Based Approach to Curved Text Detection” paper.

Universal recognizer models

Once get detected, they are extracted with an integrated convolutional neural network (CNN) and deep bidirectional long short-term memory (DBLSTM) model in combination with traditional statistical models like weighted finite state transducers (WFST). The relevant research references are the papers - “Compact and Efficient WFST-based Decoders for Handwriting Recognition,” and “A Compact CNN-DBLSTM Based Character Model For Offline Handwriting Recognition with Tucker Decomposi...”  The input to this decoder includes a lexicon, language models and universal script-based character models for the supported languages.

 

“Universal” text recognizer

The goal of any OCR technology today is to rapidly scale out to supporting global languages with every release. Read OCR supports hundreds of languages today. To do so efficiently, instead of building individual language models, the following illustration shows a different approach.

OCR script-based recognition modelsOCR script-based recognition models

 

The text lines from the detector are input to script-based models. These models include script-based character models, language models, and rejection models. Each script-based model results in supporting all languages that use that script. In fact, the OCR service has no knowledge of the specific languages present in the image.

 

New features and enhancements

Print OCR for Cyrillic, Arabic, and Devnagari languages

Read as the foundational OCR model now supports 164 languages in Form Recognizer 3.0 GA. Form Recognizer’s Layout and Custom template model capabilities also support the same languages. The major additions are Cyrillic, Arabic, and Devnagari scripts and supporting languages.

The following screenshot from the Form Recognizer Studio shows an example of OCR for Russian text.

OCR for Russian exampleOCR for Russian example

 

The following screenshot from the Form Recognizer Studio shows an example of OCR for Arabic text.

OCR for Arabic exampleOCR for Arabic example

 

The following screenshot from the Form Recognizer Studio shows an example of OCR for Hindi text.

OCR for Hindi exampleOCR for Hindi example

 

Handwriting OCR for Chinese, Japanese, and Korean and Latin languages

Form Recognizer Read supports OCR for handwritten text in Chinese Simplified, French, German, Italian, Japanese, Korean, Portuguese, and Spanish. By implication, Layout and Custom Forms also support handwritten text in these languages.

 

Typically, forms contain both printed and handwritten style text on the same page. The following examples show the previously skipped handwritten text now correctly extracted in Form Recognizer v3.0.

 

Form Recognizer v2.1 (2021)

Form Recognizer v3.0 (2022)

Form Recognizer v2.1 OCR exampleForm Recognizer v2.1 OCR example

 

Form Recognizer v3.0 OCR exampleForm Recognizer v3.0 OCR example

 

Paragraphs

The Read model extracts all identified blocks of text as part of the paragraphs collection. Each entry in this collection groups individual text lines together along with the bounding polygon coordinates of the identified text block. The span information points to the text fragment within the top-level content property that contains the full text from the document.

 

 

 

 

 

"paragraphs": [
	{
	    "spans": [],
	    "boundingRegions": [],
	    "content": "While healthcare is still in the early stages of its Al journey, we are seeing pharmaceutical and other life sciences organizations making major investments in Al and related technologies.\" TOM LAWRY | National Director for Al, Health and Life Sciences | Microsoft"
	}
]

 

 

 

 

 

Language detection

Read adds language detection as a new feature for text lines. Read will predict the primary detected language for each text line along with the confidence score.

 

 

 

 

 

"languages": [
    {
        "spans": [
            {
                "offset": 0,
                "length": 131
            }
        ],
        "locale": "en",
        "confidence": 0.7
    },
]

 

 

 

 

 

Microsoft Office and HTML support (preview)

Use the api-version=2022-06-30-preview when using the REST API or the corresponding SDKs for that API version to preview the support for Microsoft Word, Excel, PowerPoint, and HTML files. The service extracts text including from any embedded images and output all text in the output.

Form Recognizer Office and HTML support previewForm Recognizer Office and HTML support preview

Date extraction

Dates are important business data that are critical for automatic business workflows based on OCR results. The following examples show the improvement in date extraction in the new release.

Form Recognizer v2.1 (2021)

Form Recognizer v3.0 (2022)

Form Recognizer v2.1 OCR exampleForm Recognizer v2.1 OCR example

 

Form Recognizer v3.0 OCR exampleForm Recognizer v3.0 OCR example

 

Boxed character extraction

Forms commonly have character boxes with single characters that make it easier for humans but harder for machines to extract reliably. The following examples show the improvement in boxed character extraction in the new release.

Form Recognizer v2.1 (2021)

Form Recognizer v3.0 (2022)

Form Recognizer v2.1 OCR exampleForm Recognizer v2.1 OCR example

 

Form Recognizer v3.0 OCR exampleForm Recognizer v3.0 OCR example

 

Check MICR text extraction

Checks with MICR text that contains important account information are the backbone of any financial workflow system. The following examples show the improvement in MICR text extraction in the new release.

Form Recognizer v2.1 (2021)

Form Recognizer v3.0 (2022)

Form Recognizer v2.1 OCR exampleForm Recognizer v2.1 OCR example

 

Form Recognizer v3.0 OCR exampleForm Recognizer v3.0 OCR example

 

LED text extraction

The Covid pandemic and the related economic and labor situation has forced many human loop processes into automated data capture and processing models. The following examples show the improvement in LED style text extraction in the new release.

Form Recognizer v2.1 (2021)

Form Recognizer v3.0 (2022)

Form Recognizer v2.1 OCR exampleForm Recognizer v2.1 OCR example

 

Form Recognizer v3.0 OCR exampleForm Recognizer v3.0 OCR example

 

Get Started with Form Recognizer Read OCR

Start with the new Read model in Form Recognizer with the following options:

 

1. Try it in Form Recognizer Studio by creating a Form Recognizer resource in Azure and trying it out on the sample document or on your own documents.

Form Recognizer Studio OCR demoForm Recognizer Studio OCR demo

2. Refer to the OCR SDK QuickStart for complete code samples in .NET, Python, JavaScript, and Java.

 

 

 

 

 

# sample form document
    formUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/read.png"

    # create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
    document_analysis_client = DocumentAnalysisClient(
        endpoint=endpoint, credential=AzureKeyCredential(key)
    )
    
    poller = document_analysis_client.begin_analyze_document_from_url(
            "prebuilt-read", formUrl)
    result = poller.result()

    print ("Document contains content: ", result.content)

 

 

 

 

 

3. Follow the OCR REST API QuickStart. All it takes is two operations to extract the text.

 

 

 

 

 

curl -v -i POST "{endpoint}/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2022-08-31" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: {key}" --data-ascii "{'urlSource': 'https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/read.png'}"
.....
curl -v -X GET "{endpoint}/formrecognizer/documentModels/prebuilt-read/analyzeResults/{resultId}?api-version=2022-08-31" -H "Ocp-Apim-Subscription-Key: {key}"

 

 

 

 

 

Customer success – Fujitsu

Fujitsu logoFujitsu logo

Fujitsu is the world leader in document scanning technology, with more than 50 percent of global market share, but that doesn't stop the company from constantly innovating. To improve the performance and accuracy of its cloud scanning solution, Fujitsu incorporated Azure Form Recognizer’s OCR technology. It took only a few months to deploy the new technologies, and they have boosted character recognition rates as high as 99.9 percent. This collaboration helps Fujitsu deliver market-leading innovation and give its customers powerful and flexible tools for end-to-end document management.

 

Learn more about the Fujitsu’s OCR story and other Form Recognizer customer successes.

Additional resources

The Form Recognizer v3.0 announcement article covers all new capabilities and enhancements. Be sure to check it out. Refer to the following resources to learn more and get started.

  1. Form Recognizer Read OCR model overview
  2. How to use the Read OCR model
  3. Form Recognizer What’s New in version 3.0
  4. Form Recognizer overview
  5. Form Recognizer QuickStart
Co-Authors
Version history
Last update:
‎Jan 25 2024 08:17 AM
Updated by: