Automate document analysis with Azure Form Recognizer using AI and OCR
Published Apr 12 2023 09:03 AM 6,787 Views
Bronze Contributor

Extract text automatically from forms, structured or unstructured documents, and text-based images at scale with AI and OCR using Azure’s Form Recognizer service and the Form Recognizer Studio. Build intelligent document processing apps using Azure AI services. Leverage pre-trained models or build your own custom models to help speed up app development with the SDK and APIs. 

 

Main.png

 

  • Use a pre-built model for W2 forms 
  • Build and train custom models to extract data from 1099, 1040, and W9 forms 
  • Extract data from receipts with handwritten tips, in different languages, currencies, and date formats. 

Bema Bonsu, from Azure’s AI engineering team in Azure, joins Jeremy Chapman to share updates to custom app experiences for document processing.

 

Automate your tax process.

1- taxes.png

 

Use a pre-built model for W2 forms & train it to handle others. Watch this Form Recognizer Studio demo.

 

Track expenses with pre-built models.

2- receipts.png

 

Extract data from receipts with handwritten tips, in different languages, currencies, and date formats. Check it out in Form Recognizer Studio.

 

Build and train custom models.

3- Custom.png

 

Extract data from 1099, 1040, and W9 forms in Azure Form Recognizer. See it here.

 

Watch our video here.

 


QUICK LINKS: 

00:00 — Introduction 

01:50 — Form Recognizer updates 

03:27 — Extract data from tax forms 

04:56 — Extract data from handwritten text and other languages 

06:00 — Build and train custom models 

08:27 — Extract data from unstructured docs 

10:15 — Code behind custom apps 

12:07 — Wrap up

 

Link References: 

Get started with Azure Form Recognizer Studio at https://aka.ms/formrecognizerstudio 

 

Find out more about Contracts 365 and their services at: https://www.contracts365.com/ 

 

H&R Block and how they use Form Recognizer: https://techcommunity.microsoft.com/t5/ai-applied-ai-blog/automating-tax-documents-processing-with-a... 

 

Unfamiliar with Microsoft Mechanics? 

As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. 

 

Keep getting this insider knowledge, join us on social: 


Video Transcript:

- Today we’ll look at the latest updates for building intelligent document processing apps using Azure AI services to extract text automatically from forms, structured or even unstructured documents, and text-based images at scale. From using Azure’s Form Recognizer Studio to leverage both pre-trained models and even bring and build your own to leveraging its raw data and code outputs to help speed up your development of custom apps. And joining us today from the Azure AI engineering team is Bema Bonsu. Welcome to Mechanics.

 

- Happy to be here.

 

- And thanks so much for joining us today, Bema. You know, document processing is one of those areas that’s always been rife for AI. It removes the manual effort needed to understand and classify documents as part of everyday information management or more custom solutions in areas like contract management or tax processing. So how are we moving this area forward?

 

- You are right. This is definitely a prime area for AI. If you think about the different structured and unstructured formats, content types, different spoken languages, handwritten inputs, and custom data fields that exist across different domains, it would be impossible to be able to handle this without leveraging machine learning and AI. A big focus for us has been on the precise application of AI to continually improve document understanding capabilities that you can call programmatically from Azure AI services and integrate with your app experiences regardless of where you are in the world.

 

- To that point, by the way, Azure AI services are in fact foundational to the intelligent document processing capabilities that are in a number of Microsoft services today, whether that’s information management in Microsoft Syntex where business users can point to whole document libraries and Syntex will just scan, analyze, and process items at scale, or you can use low-code document processing with AI Builder in the Power Platform as part of apps and workflows. And in Azure, all of this comes together with the Form Recognizer service for custom apps. So can you walk us through the core experiences and some of the recent updates?

 

- Sure, so for custom apps, there are a number of key updates that make document understanding better. I’m in Azure Form Recognizer Studio. I’m going to point out a few here. First, you’ll see we’ve added more document analysis capabilities, allowing you to extract text, tables, and key-value pairs. We’ve also added updates to aid with navigation, search, and table labeling. Beyond that, we have also extended our list of pre-built models for tax documents and business cards. There are now more models as well as regional-specific content like invoices, receipts, IDs, and health insurance cards. We are also offering a new pre-built contract model that you can use to extract signatory information. And something that continually gets better, our custom extraction models, lets you extract key-value pairs, selection marks, tables, signature fields, selected regions, and just about any other text or text-based image patterns you define.

 

- And these models are really great, by the way, for understanding both structured and unstructured documents like financial reports, insurance claims, proposals, mortgage documents, and more.

 

- They are. And it’s worth pointing out that as a global service, we support more than 300 spoken languages. I’m also really excited about the additional work we are doing for document extraction and search with the integration of large language models using Azure OpenAI. For example, using the General Document model, you can use natural language to describe the values within documents that you might be looking for, and it will identify the content even if your documents contain slight variations. Also, in the area of document classification, the service will now identify document types within the same file and split those out as separate components for analysis.

 

- And the improved reasoning, by the way, over unstructured content is something that I know a lot of people watching are going to be excited about. So I’d love to see how this works.

 

- Of course, and I’ve got a very timely example for you. For anyone living in the US dealing with tax season, you’ve probably heard of the tax preparation company H&R Block. They prepare more than 20 million tax returns each year. It’s not uncommon for many of their customers to literally show up with piles of printed paper tax forms and receipts. No one wants to manually enter those items. For the last couple of years, they’ve turned to Azure AI services to automate many of their processes. Form Recognizer can work across tax forms to extract data and help automate that process. In the US, we have common tax forms like W2s, 1099s, 1040s, and W-9s that we use to file taxes. Form Recognizer has a pre-built model for W2s and you can easily train it to handle the other forms, so we’ll start there. In Form Recognizer Studio, we have sample W2 forms preloaded, as you can see here on the left. The first one is an image scan from a paper form, which you can see from the scanned text. And the second one is a lot clearer, like a screenshot of a digital form. I’ll use the first one because the text and the markings are a little more difficult to read. Once I hit analyze, you’ll see all of the scanned fields on the right, along with percentage confidence levels. If I go into the result tab, it shows the output of the analyze step as a raw JSON file. I can use this JSON file to display the information in a custom app or enter the captured fields into a table or database. And in the code tab, I can see Python code here by default, which I can use in a custom app as a starting point. And I can also view it as Javascript as well as C# depending on my needs.

 

- And this also works for handwritten text and, like you said, across many different languages.

 

- Absolutely. Receipts for expenses often have handwritten elements and that’s where you can leverage the models pre-built in Form Recognizer Studio. Here, you can see a few common receipt types for retail goods, hotel accommodation, food service with handwritten tips, receipts in other languages like French, Spanish, and Japanese, and these span different currencies and use different date formats. I’ll start with the receipt that has a handwritten tip, and you’ll see that even though the handwriting wasn’t super easy to read, the service is successfully able to pull the data correctly throughout with high confidence. Now, let’s look at the Japanese receipt next. I’ll analyze it and you’ll see it’s extracted with address elements, the date using year/month/day format, and the transaction amounts. If I expand the items, you’ll see that each line item was correctly scanned and outputted as correct Japanese characters, even though a few of the lines in the characters are thin and hard to read. With these pre-built models and samples, it’s super easy to get started and you can even upload your own examples to verify that they’re extracted well.

 

- Right, and this worked really well even for double-byte characters that were scanned on printed paper. Now, you mentioned there were forms like Form 1099 and also 1040 and W-9 and, you know, those aren’t built into the service. So how difficult would it be to add one of those forms into the service?

 

- It’s actually really easy. The best thing to do is to start with a couple of forms you’re trying to scan. I’ve pre-scanned some W-9s in advance. So let me show you that experience. Back in the Form Recognizer Studio, I’ll choose to create a custom model this time. Here’s where you’ll return to your custom projects list if you have a few already. I’ll create a new project, give it a name W-9, choose my subscription, resource group, and resource along with a few more fields for storage. The storage account is used to store the training data set that you create. From there, I can start building and training new models. In the custom model, you’ll see that there are tabs for Label Data, Models, Test, and Settings. The documents you see here are my scanned W-9 forms. You’ll need at least five examples to get started. I’ll select one of my documents and start by selecting the plus sign on the right and adding a field. These can be text, key-value pairs, or selection marks. That can be used, for example, with the tax classification section and the checkmarks inside. You can also define signatures or tables. I’ll start by choosing a text field and just create one for name. And from here, you’ll repeat this process for other text, signature, or table fields, and once you’ve added all of the fields you need, you can start to identify the parts in the document that correspond to your fields.

- So to save a little time, you’ll see that I have a dozen or so fields and a signature defined. Now I can match each of the fields in the document with my labels. I just need to click on the word in the document and assign it to the corresponding field from the list. You’ll need to repeat this step for every field in the document for each of the five documents in this case. Then once you’re finished, you can train your model. I’ll hit train, then I’ll give my model a name W-9model. I’ll skip the description, choose the model type, in this case, Neural, and this process will take several minutes. And when it’s done, you can test it out. So from the test tab, I’ll upload another two sample forms, and just like we did before with the pre-built models, I’ll just need to hit analyze. When it’s finished, you can see that it has successfully detected all of our fields with our defined labels. And just like I showed before, I can get to the JSON results and the code to add this to a custom app or workflow. As you can see, it took just a few minutes to build a custom neural model that’ll extract the data automatically from any other scanned W-9 forms I may have.

 

- Right, and because, by nature, taxes are pretty seasonal, this is also a great example of how you take advantage of the elasticity of the cloud to really scale out when you need to. And you also mentioned that it’s possible then to extract data from unstructured documents, right?

 

- That is right. Contract processing has been available for a few months as a pre-built neural document model. This extracts the important elements of a contract as key-value pairs to speed up many of the workflows around contract management. Let me show you how Contracts 365, a leading provider of contract management software, are leveraging our AI services to do this. So this is the Contracts 365 experience for requesting new counterparties, contracts, and documents. I’ll start with requesting a new contract. I can select the request type to request a contract from a template, file an executed contract, or review a third-party contract. Request Priority lets me set system conditions for how quickly a contract needs to be processed. These conditions can force the user to supply additional information per company policy. I’ll review a new third-party contract with an urgent one-to-two-day priority The AI processing will extract the contract record type, in this case, a clinical trial agreement, and I’ll upload the contract as a PDF, and that will initiate AI processing. Here, Contracts 365 uses Microsoft AI to extract contract data and then transforms, validates, and visualizes the data in the system with its AI engine. You can see that it has extracted the contract value and other important metadata about the clinical trial and its status and placed the data in the corresponding fields. Contracts 365 lets you then validate what has been inferred with highlighted citations in the PDF file. And here you can see it also finds counterparty data automatically and accurately extracts all the relevant title and address fields and populates the data in the backend system. So as you can see, the contract was completely unstructured and this is a huge benefit of leveraging a large language model. These models can easily process the various structures and terms people might use in a document like this.

 

- So how would I use then as a developer maybe some of these experiences with Form Recognizer and some of my custom apps?

 

- Let me show you a custom app we’ve built, what it does, and the code behind it. Here, I’ve built a simple app for processing invoices. I can see some basic stats about the number of documents processed, and below that are some details from a few processed invoices. Now, let me show you an invoice to get an idea of what we’re processing. Pay attention to the fields, like the invoice number and the amounts here from this invoice. Now I’ll head back over to my app and I’ll upload the PDF file we just saw. I’ll first select the file, then I’ll choose the model. In this case, it’s a pre-built invoice model, but we have quite a few additional model options to choose from. I’ll go ahead and upload it. And while this is uploading and getting processed, let me show you where this data is going. Here, we’re using Cosmos DB. If you remember the JSON output I showed earlier in Form Recognizer Studio, this is essentially what we’re adding to Cosmos DB. I have the Data Explorer already open, so I’ll refresh my items list, and this is the file I just uploaded and processed. And there’s our JSON output, again, just like we saw before, with all the key-value pairs and fields we’ve extracted. Now, if I go back to the app, I can see a clean, formatted view of the data. And here it is in our dashboard view from before, right in the top row. And even though this is a pretty sophisticated app in terms of its document processing capabilities, the nice thing from a developer standpoint is just how easy it is to code and call the web services needed.

- Let me show you what’s behind it using Visual Studio Code. Notice this is a Python app and we’re using the Python SDK. These are the environment variables we’ve defined for Azure App Service. Here you can see we’re creating the clients we need. This is so we can send our data to blob storage and the results to the Cosmos DB. This is the code that handles the upload and stores the file in Azure Storage. Here is where the Form Recognizer client sends the document to the service for analysis. And this is where we’ve instructed it to save the outputted JSON results in Cosmos DB. Once everything is finished, we return the message of completion to the user. So in just a few lines of code, we are able to build a fully functioning solution.

 

- And I can see this being a massive time saver, you know, for processing documents at scale. Now, for anyone that’s watching right now looking to get started, what do you recommend?

 

- The best way is to try Azure AI services out. And for that, you’ll need the Azure Form Recognizer Studio. If you have an Azure account set up, you can get to it by going to aka.ms/formrecognizerstudio. Then after that, to integrate Form Recognizer into a custom application or workflow, check out our guided quickstart with all the steps documented in your programming language of choice, whether that’s C#, Java, Python, or our REST API.

 

- Thanks so much for joining us today, Bema. And, of course, keep checking back to Microsoft Mechanics for all the latest updates. Thanks for watching and we’ll see you next time.

Version history
Last update:
‎Apr 12 2023 09:03 AM
Updated by: