Document Generative AI: the Power of Azure AI Document Intelligence & Azure OpenAI Service Combined

NetaH · ‎Jul 18 2023

Imagine being able to chat with your documents, generate captivating content from them, and access the power of Azure OpenAI models on your data. This is what Document Generative AI, a breakthrough solution from Azure AI Document Intelligence (former aka Azure Form Recognizer) and Azure OpenAI Service, can do for you.

In the context of enterprise applications, the question we hear most often is “how do I build something like ChatGPT that can read my documents and uses my documents as the basis for its responses?”

Document Generative AI, powered by Azure AI Document Intelligence and Azure OpenAI Service, is a groundbreaking solution that empowers you to unlock the full value of your documents and harness the capabilities of large-scale, generative AI models. This innovative solution offers a range of benefits, allowing you to:

Seamlessly interact with your documents using natural language, enabling you to easily find answers and gain valuable insights.

Effortlessly generate new and engaging content from your existing documents, including blog posts, newsletters, summaries, and captivating captions.

By leveraging Document Generative AI, you can save precious time, reduce costs, enhance accuracy, and tap into your creativity when it comes to document workflows. Whether you require intelligent document chat capabilities, writing assistance, query support, comprehensive search functionality, or even document translation, Document Generative AI excels at handling complex and diverse document tasks through the utilization of state-of-the-art AI models from OpenAI.

Embrace the future of document processing and take your enterprise data to new heights with Document Generative AI, a cutting-edge solution that will revolutionize the way you work with documents.

Azure AI Document Intelligence blog.png

In our previous blog post titled "Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w/ Azure OpenAI and Cognitive Search," we delved into the remarkable potential of Azure Cognitive Search for building powerful Chat Apps solutions. In this blog post we’ll describe how to extract information from your documents in order to enable chat on a variety of documents (scanned PDFs, digitized PDFs, images, office docs) with tables and long documents that exceed the limit size of an OpenAI prompt length. Our goal is to give you the tools necessary to build ChatGPT-powered applications starting today, using Azure OpenAI models and Azure AI services.

Enabling Chat with Diverse Document Types:

Chatting with documents has several challenges –

Varying Document Formats:

Document Types: Diverse document types, such as scanned PDFs, digitized PDFs, images, and office documents, present unique challenges due to their different formats. Extracting information from each type requires specialized techniques and tools to handle the variations in data structure and content representation.
Optical Character Recognition (OCR) Accuracy: OCR plays a crucial role in extracting text from scanned documents and images. However, OCR accuracy can vary based on factors like document quality, font styles, and language complexities. Dealing with OCR errors and ensuring accurate text extraction is essential for reliable document intelligence.
Information Extraction: Different document types contain varying types of information, such as text, tables, images, and metadata. Extracting and organizing this information effectively requires sophisticated algorithms and techniques to parse and understand the document structure. Ensuring accurate extraction of relevant information is crucial for meaningful chat interactions.

Handling Long Documents:

OpenAI Prompt Length: OpenAI models have limitations on the maximum length of the input prompt they can process. Long documents often exceed this limit, making it challenging to input the entire document into the model at once. Breaking down lengthy documents into manageable segments without losing context becomes crucial for effective chat interactions.
Context Preservation: Long documents contain valuable contextual information that contributes to accurate understanding and meaningful conversations. However, due to prompt length limitations, it becomes difficult to preserve the complete context of the document while interacting with the model. Maintaining the necessary context throughout the conversation is essential to ensure coherent and relevant responses.
Efficient Processing: Processing large documents in real-time can be computationally intensive and time-consuming. Efficient algorithms and techniques are necessary to chunk and process the document segments in a way that optimizes resource usage and minimizes latency. Balancing processing efficiency with accuracy is key to enabling smooth chat interactions with long documents.

Overcoming these Challenges:

To address the challenges of chatting with diverse document types and long documents, leveraging Azure AI Document Intelligence, Azure Cognitive Search and Azure OpenAI models together can provide effective solutions. By combining Azure AI Document Intelligence OCR and Layout extraction capabilities, document parsing techniques, and using an intelligent chunking algorithm, you can overcome format variations, ensure accurate information extraction, and efficiently process long documents. This empowers you to create chat-based applications that can handle a wide range of document types and seamlessly interact with lengthy documents, extracting valuable insights and enabling meaningful conversations.

You can use Azure AI document intelligence to ingest your documents into Azure Cognitive Search with the following solutions and services:

Azure OpenAI on your data
Azure OpenAI on your data enables you to run supported chat models such as GPT-35-Turbo and GPT-4 on your data without needing to train or fine-tune models. You can ingest your documents into Cognitive Search using Azure AI Document Intelligence. If your documents include PDFs (scanned or digitized PDFs, images (png, jpg, tiff formats) you will need to extract the information from these documents. See here data preparation script that will extract all the information from your documents using Azure AI Document Intelligence Layout service. It will also extracts the table information, and preserve the formatting information in your documents such as titles and sub-headings, which will make the citations more readable.
```
python data_preparation.py --config config.json --njobs=4 --form-rec-resource <form-rec-resource-name> --form-rec-key <form-rec-key> --form-rec-use-layout
```
ChatGPT + Enterprise data with and Azure
This github sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access the ChatGPT model (gpt-35-turbo), and Azure Cognitive Search for data indexing and retrieval. You can Ingest your data into Cognitive Search using Azure AI Document Intelligence to extract information from documents PDFs and images see sample script here.

Chatting with your documents:

By using Document Generative AI, you can now chat with your documents. For example lets chat with invoices, contracts and SOWs using Azure OpenAI on your data service and ingesting the documents with Azure AI document intelligent layout service to preserve table information and document layout. In this scenario a Finance manager from Adventure Works is checking latest invoices. Amount of latest invoice from Contoso is pretty high ($6.5k). After checking line items, $5K for web hosting was found. Finds corresponding PO and SOW, checks that amount is matched but still wondering who signed such an expensive contract and finds out that it is company's CEO.

Example of a chat in the github sample app -

Blog Chat app.jpg

Example of a chat in the Azure OpenAI on your data Web App -

Example of a chat in the Azure OpenAI studio using Azure OpenAI on your data -

Note how now you can chat with tables, understand tables and line items and unlock the information hidden within your documents. You can also verify that the responses are trustworthy by viewing the citations. Each statement in the response includes a citation with a link to the source content. You can see the citations in context (the superscript numbers) as well as the links at the bottom. When you click on one, we display the original content so the user can inspect it.

More scenarios

In this blog post we focused on conversation and question answering scenarios on your documents that combine Azure AI Document Intelligence, ChatGPT from Azure OpenAI Service with Azure Cognitive Search as a knowledge base and retrieval system. There are many scenarios and use cases in which combining these services can improve your workflow and productivity:

Invoice processing: You can use our solution to automatically extract key information from invoices, such as vendor name, invoice number, date, amount, etc., and generate payment requests or summaries for your accounting system.
Report generation: You can use our solution to automatically generate new content based on your document data, such as charts, graphs, tables, summaries, etc., and create professional-looking reports for your stakeholders.
Document classification: You can use our solution to automatically classify your documents into different categories based on their content and layout, such as contracts, proposals, resumes, etc., and organize them for easy retrieval and management.
Document Q&A: You can use our solution to automatically answer questions about your documents in natural language using a chat-like interface. For example, you can ask "Who is the author of this document?" or "What is the main conclusion of this report?" and get instant answers from our solution.

Try this out today, on your own data or ours

You can get started and try this out now via the Azure OpenAI Service on your data capability or via the sample code in this GitHub repo. These solution both include the complete UX shown in this blog post, We plan on continuously expanding our service and this repo with a focus on covering more scenarios.

We’re excited about the prospect of improved and brand-new scenarios powered by the availability of large language models combined with Document Generative AI solution. We look forward to seeing what you will build with Azure OpenAI, Azure AI Document Intelligence and Azure Cognitive Search.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs