Using Azure AI Document Intelligence and Azure OpenAI to extract structured data from documents
Published Apr 09 2024 06:42 AM 4,266 Views
Microsoft

Addressing the challenges of efficient document processing, explore a novel solution to extract structured data from documents using Azure AI Document Intelligence and Azure OpenAI.

 

Context

 

In today’s data-driven landscape, efficient document processing is crucial for most organizations worldwide. Accurate document analysis is essential to provide much needed streamlining of business workflows to enhance productivity.

 

In this article, we’ll explore the key challenges that solution providers face with extracting relevant, structured data from documents. We'll also showcase a novel solution to solve these challenges using Azure AI Document Intelligence and Azure OpenAI.

 

Key challenges of effective document data extraction

 

ISVs and Digital Natives building document data extraction solutions often grapple with the complexities of finding a reliable mechanism to parse their customer’s documents. The key challenges include:

 

  • Variability in document layout. Documents, such as contracts or invoices, often contain similar data. However, they vary in both layout, structure, and language, including domain jargon.
  • Content in unstructured formats. It is common for pieces of useful information to be stored in unstructured formats, such as handwritten letters or emails.
  • Diversity in file formats. Solutions need to be able to handle a variety of formats that customers provide to them. This includes images, PDFs, Word documents, Excel spreadsheets, emails, and HTML pages.

With many Azure AI services to build solutions with, it can be difficult for teams to identify the best approach to resolve these challenges.

 

Benefits of using Azure AI Document Intelligence with Azure OpenAI

 

As solution providers for document data extraction capabilities, the following approach enables these benefits over other approaches:

 

  • No requirement to train a custom model. Combining these Azure AI services allows you to extract structured data without the need to train a custom model for the various document formats and layouts that your solution may receive. Instead, you tailor natural language prompts to your specific needs.
  • Define your own schema. The capabilities of GPT models enables you to extract data that matches or closely matches a schema that you define. This is a major benefit over alternative approach, particularly when each document’s domain jargon differs. This makes it easier to extract structured data accurately for your downstream processes post-extraction.
  • Out-of-the-box support for multiple file types. This approach supports a variety of document types, including PDFs, Office file types, HTML, and images. This flexibility allows you to extract structure data from a variety of sources without the need for custom logic in your application for each file type.

Let’s explore how to extract structured data from documents with both Azure AI Document Intelligence and Azure OpenAI in more detail.

 

Understanding layout analysis to Markdown with Azure AI Document Intelligence

 

Updated in March 2024, the pre-built layout model in Azure AI Document Intelligence gained new capabilities to extract content and structure from Office file types (Word, PowerPoint, and Excel) and HTML, alongside the existing PDF and image capabilities.

 

This introduced the capability for document processing solutions to take any document, such as a contract or invoice, with any layout or file format, and convert it into a structured Markdown output. This has the significant benefit of maintaining the content’s hierarchy when extracted.

 

This is important when we consider the capabilities of the Azure OpenAI GPT models. GPT models are pre-trained on vast amounts of natural language data, which helps them to understand structures and semantic patterns. The simplicity of Markdown’s markup allows GPT models to interpret structures such as headings, lists, and tables, as well as formatting such as links, emphasis (italic/bold), and code blocks.

 

When you combine these capabilities for data extraction with efficient prompting, you can easily and accurately extract relevant data as structured JSON.

 

Combining Azure AI Document Intelligence layout analysis with GPT prompting for data extraction

 

The following diagram illustrates this novel approach, introducing the new Markdown capabilities of Azure AI Document Intelligence’s pre-built layout model with completion requests to Azure OpenAI to extract the data.

 

A novel approach to efficient data extraction from documents using Azure AI Document Intelligence and Azure OpenAIA novel approach to efficient data extraction from documents using Azure AI Document Intelligence and Azure OpenAI

 

This approach is achieved in the following way:

 

  1. A customer uploads their files to analyze for data extraction. This could be of any supported file type, including PDF, image, or Word document.
  2. The application makes a request to the Azure AI Document Intelligence’s analyze API using the pre-built layout model with the output content format flag set to Markdown. The document data is provided in the request either as a base64 source or a URI.
    • If you are processing many, large documents, it is recommended to use a URI to reduce the memory utilization which will prevent unexpected behavior in your application. You can achieve this approach by uploading your documents to an Azure Blob Storage container and providing a SAS URI to the document.
  3. With the Markdown result as context, prompt the Azure OpenAI completions API with specific instruction to extract the structured data you require in a JSON format. With a now structured data response, you can store this data however you require for the needs of your application.

For a full code sample demonstrating this capability, check out the using Azure AI Document Intelligence and Azure OpenAI GPT-3.5 Turbo to extract structured data from documents sample on GitHub. Along with the code, this sample includes the necessary infrastructure-as-code Bicep templates to deploy the Azure resources for testing.

 

Conclusion

 

Adopting Azure AI Document Intelligence and Azure OpenAI to extract structured data from documents simplifies the challenges of document processing today. This well-rounded solution offers significant benefits over alternatives, removing the requirement to train custom models and improving overall accuracy of data extraction in most use cases.

 

Consider the following recommendations to maximize the benefits of this approach:

 

  • Experiment with prompting for data extraction. The provided code sample provides a well-rounded starting point for structure data extraction. Consider experimenting with the prompt and JSON schemas to incorporate domain specific language to capture the nuances in your documents to improve accuracy further.
  • Optimize the document processing workflow. As you scale out this approach to production, consider the host resource requirements for your application to process a large quantity of documents. Optimize this approach by maximizing CPU and memory usage by offloading the loading of documents to Azure AI Document Intelligence using URIs.

By adopting this approach, solution providers can streamline their document processing workflows, enhancing productivity for themselves and their customers.

 

Read more on AI Document Intelligence

 

Thank you for taking the time to read this article. We are sharing our insights for ISVs and Digital Natives that enable document intelligence in their AI-powered solutions, based on real-world challenges we encounter. We invite you to continue your learning through our additional insights in this series.

 

 

Further Reading

3 Comments
Co-Authors
Version history
Last update:
‎Apr 09 2024 07:31 AM
Updated by: