Businesses today are applying Optical Character Recognition (OCR) and document AI technologies to rapidly convert their large troves of documents and images into actionable insights. These insights power robotic process automation (RPA), knowledge mining, and industry-specific solutions. However, there are several challenges to successfully implementing these scenarios at scale.
Your customers are global, and their content is global so your systems should also speak and read international languages. Nothing is more frustrating than not reaching your global customers due to lack of support for their native languages.
Secondly, your documents are large, with potentially hundreds and even thousands of pages. To complicate things, they have print and handwritten style text mixed into the same documents. To make matters worse, they have multiple languages in the same document, possibly even in the same line.
Thirdly, you are a business that’s trusted by your customers to protect their data and information. If your customers are in industries such as healthcare, insurance, banking, and finance, you have stringent data privacy and security needs. You need the flexibility to deploy your solutions on the world’s most trusted cloud or on-premise within your environment.
Finally, you should not have to choose between world-class AI quality, world languages support, and deployment on cloud or on-premise.
Computer Vision OCR (Read API)
Microsoft’s Computer Vision OCR (Read) technology is available as a Cognitive Services Cloud API and as Docker containers. Customers use it in diverse scenarios on the cloud and within their networks to help automate image and document processing.
We are announcing Computer Vision's Read API v3.2 public preview as a cloud service and Docker container. It includes the following updates:
OCR for 73 languages including Simplified and Traditional Chinese, Japanese, Korean, and several Latin languages.
Natural reading order for the text line output.
Handwriting style classification for text lines.
Text extraction for selected pages for a multi-page document.
With the latest Read preview version, we are announcing OCR support for 73 languages, including Chinese Simplified, Chinese Traditional, Japanese, Korean, and several Latin languages, a 10x increase from the Read 3.1 GA version.
Thanks to Read’s universal model, to extract the text in these languages, use the Read API call without the optional language parameter. We recommend not using the language parameter if you are unsure of the language of the input document or image at run time.
The latest Read preview supports the following languages:
OCR services typically output text in a certain order in their output. With the new Read preview, choose to get the text lines in the natural reading order instead of the default left to right and top to bottom ordering. Use the new readingOrder query parameter with the “natural” value for a more human-friendly reading order output as shown in the following example.
The following visualization of the JSON formatted service response shows the text line order for the same document. Note the first column's text lines output in order before listing the second column and finally the third column.
OCR Read order example
For example, the following curl code sample calls the Read 3.2 preview to analyze the sample newsletter image and output a natural reading order for the extracted text lines.
When you apply OCR on business forms and applications, it’s useful to know which parts of the form has handwritten text in them so that they can be handled differently. For example, comments and the signature areas of agreements typically contain handwritten text. With the latest Read preview, the service will classify Latin languages-only text lines as handwritten style or not along with a confidence score.
For example, in the following image, you see the appearance object in the JSON response with the style classified as handwriting along with a confidence score.
OCR handwriting style classification for text lines
Many standard business forms have fillable sections followed by long informational sections that are identical between documents, and versions of those documents. At other times, you will be interested in applying OCR to specific pages of interest for business-specific reasons.
The following curl code sample calls the Read 3.2 preview to analyze the financial report PDF document with the pages input parameter set to the page range, "3-5".