Overview
Businesses today are applying Optical Character Recognition (OCR) and document AI technologies to rapidly convert their large troves of documents and images into actionable insights. These insights power robotic process automation (RPA), knowledge mining, and industry-specific solutions. However, there are several challenges to successfully implementing these scenarios at scale.
The challenge
Your customers are global, and their content is global so your systems should also speak and read international languages. Nothing is more frustrating than not reaching your global customers due to lack of support for their native languages.
Secondly, your documents are large, with potentially hundreds and even thousands of pages. To complicate things, they have print and handwritten style text mixed into the same documents. To make matters worse, they have multiple languages in the same document, possibly even in the same line.
Thirdly, you are a business that’s trusted by your customers to protect their data and information. If your customers are in industries such as healthcare, insurance, banking, and finance, you have stringent data privacy and security needs. You need the flexibility to deploy your solutions on the world’s most trusted cloud or on-premise within your environment.
Finally, you should not have to choose between world-class AI quality, world languages support, and deployment on cloud or on-premise.
Computer Vision OCR (Read API)
Microsoft’s Computer Vision OCR (Read) technology is available as a Cognitive Services Cloud API and as Docker containers. Customers use it in diverse scenarios on the cloud and within their networks to help automate image and document processing.
What’s New
We are announcing Computer Vision's Read API v3.2 public preview as a cloud service and Docker container. It includes the following updates:
- OCR for 73 languages including Simplified and Traditional Chinese, Japanese, Korean, and several Latin languages.
- Natural reading order for the text line output.
- Handwriting style classification for text lines.
- Text extraction for selected pages for a multi-page document.
- Available as a Distroless container for on-premise deployment.
First wave of language expansion
With the latest Read preview version, we are announcing OCR support for 73 languages, including Chinese Simplified, Chinese Traditional, Japanese, Korean, and several Latin languages, a 10x increase from the Read 3.1 GA version.
Thanks to Read’s universal model, to extract the text in these languages, use the Read API call without the optional language parameter. We recommend not using the language parameter if you are unsure of the language of the input document or image at run time.
The latest Read preview supports the following languages:
For example, once you have created a Computer Vision resource, the following curl code will call the Read 3.2 preview with the sample image.
Make the following changes in the command where needed:
- Replace the value of
<subscriptionKey>
with your subscription key. - Replace the first part of the request URL (
westcentralus
) with the text in your own endpoint URL.
curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyze" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{\"url\":\"https://upload.wikimedia.org/wikipedia/commons/thumb/a/af/Atomist_quote_from_Democritus.png/338px-Atomist_quote_from_Democritus.png\"}"
The response will include an Operation-Location
header, whose value is a unique URL. You use this URL to query the results of the Read operation. The URL expires in 48 hours.
curl -v -X GET "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{body}"
Natural reading order output (Latin languages)
OCR services typically output text in a certain order in their output. With the new Read preview, choose to get the text lines in the natural reading order instead of the default left to right and top to bottom ordering. Use the new readingOrder query parameter with the “natural” value for a more human-friendly reading order output as shown in the following example.
The following visualization of the JSON formatted service response shows the text line order for the same document. Note the first column's text lines output in order before listing the second column and finally the third column.
For example, the following curl code sample calls the Read 3.2 preview to analyze the sample newsletter image and output a natural reading order for the extracted text lines.
curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyze?readingOrder=natural -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{\"url\":\"https://docs.microsoft.com/en-us/microsoft-365-app-certification/media/dec01.png\"}"
The response will include an Operation-Location
header, whose value is a unique URL. You use this URL to query the results of the Read operation.
curl -v -X GET "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{body}"
Handwriting style classification (Latin languages)
When you apply OCR on business forms and applications, it’s useful to know which parts of the form has handwritten text in them so that they can be handled differently. For example, comments and the signature areas of agreements typically contain handwritten text. With the latest Read preview, the service will classify Latin languages-only text lines as handwritten style or not along with a confidence score.
For example, in the following image, you see the appearance object in the JSON response with the style classified as handwriting along with a confidence score.
The following code analyzes the sample handwritten image with the Read 3.2 preview.
curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyze -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{\"url\":\"https://intelligentkioskstore.blob.core.windows.net/visionapi/suggestedphotos/2.png\"}"
The response will include an Operation-Location
header, whose value is a unique URL. You use this URL to query the results of the Read operation.
curl -v -X GET "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{body}"
Extract text from select pages of a document
Many standard business forms have fillable sections followed by long informational sections that are identical between documents, and versions of those documents. At other times, you will be interested in applying OCR to specific pages of interest for business-specific reasons.
The following curl code sample calls the Read 3.2 preview to analyze the financial report PDF document with the pages input parameter set to the page range, "3-5".
curl -v -X POST "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyze?pages=3-5 -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{\"url\":\"https://www.annualreports.com/HostedData/AnnualReports/PDF/NASDAQ_MSFT_2019.pdf\"}"
The response will include an Operation-Location
header, whose value is a unique URL. You use this URL to query the results of the Read operation.
curl -v -X GET "https://westcentralus.api.cognitive.microsoft.com/vision/v3.2-preview.2/read/analyzeResults/{operationId}" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{body}"
The following JSON extract shows the resulting OCR output that extracted the text from pages 3, 4, and 5. You should see a similar output for your sample documents.
"readResults": [
{
"page": 3,
"angle": 0,
"width": 8.5,
"height": 11,
"unit": "inch",
"lines": []
},
{
"page": 4,
"angle": 0,
"width": 8.5,
"height": 11,
"unit": "inch",
"lines": []
},
{
"page": 5,
"angle": 0,
"width": 8.5,
"height": 11,
"unit": "inch",
"lines": []
}
]
On-premise option with Distroless container
The Read 3.2 preview OCR container provides:
- All features from the Read cloud API preview
- Distroless container release
- Performance and memory enhancements
Install and run the Read containers to get started and find the recommended configuration settings.
Get Started
- Create a Computer Vision resource in Azure.
- Follow our SDK and REST API QuickStarts.
- Learn more about OCR (Read) and Form Recognizer.
- See the list of OCR supported languages.
- Learn more about the Read containers and download them from Docker Hub.
- Write to us at formrecog_contact@microsoft.com