In our previous blog post: Document Generative AI: the Power of Azure AI Document Intelligence & Azure OpenAI Service Combined, we introduced what Document Generative AI is and how you can use Azure AI Document Intelligence (formerly known as Azure Form Recognizer) and Azure OpenAI service to enable chat on a variety of enterprise long documents.
Retrieval Augmented Generation (RAG) is a design pattern that is commonly used in Document Generative AI (for an example, see the repo here). It is an architecture that augments the capabilities of a Large Language Model (LLM) like ChatGPT by adding an information retrieval system that provides the data. Adding an information retrieval system gives you control over the data used by an LLM when it formulates a response. Enterprise documents are usually long and complex, though LLM can take in more context recently, a good chunking strategy is still required to divide them into smaller pieces that can be more efficient in storage and retrieval, as well as enhancing the relevance and interpretability of the results. However, most chunking strategy in RAG today is still based on text length without much consideration on document structure. There’s a high demand for semantic chunking – so how do you divide a large body of texts or documents into smaller, meaningful chunks based on semantic content rather than arbitrary splits?
The Azure AI Document Intelligence Layout model offers a comprehensive solution for semantic chunking by providing advanced content extraction and document structure analysis capabilities. With this model, you can easily extract paragraphs, tables, titles, section headings, selection marks, font/style, key-value pairs, math formulas, QR code/barcode and more from various document types. The extracted information can be conveniently outputted to markdown format, enabling you to define your semantic chunking strategy based on the provided building blocks.
Benefits of using the Layout Model:
- Simplified process: You can parse different document types, such as digital and scanned PDFs, images, office files (docx, xlsx, pptx), and html, with just a single API call.
- Scalability and AI quality: The model is highly scalable in Optical Character Recognition (OCR), table extraction, document structure analysis (e.g., paragraphs, titles, section headings), and reading order detection, ensuring high-quality results driven by AI capabilities. It supports 309 printed and 12 handwritten languages.
- LLM compatibility: The output format is LLM friendly, specifically markdown, which facilitates seamless integration into your workflows. You can turn any table in a document into markdown format, which will save lots of effort parsing the documents to make LLM better understand them.
Figure 1 Layout model can detect document structures and output to markdown.
Figure 2 Layout model can extract tables from your document.
Getting started
Azure AI Document Intelligence Studio
- Go to Document Intelligence Studio - Microsoft Azure, choose the Analyze options required:
- Click on Run analysis and view the output, sample code on the right pane:
SDK and REST API
- Quickstart: Document Intelligence SDKs – use your preferred SDK or REST API to extract content and structure from documents.
- Sample code of using Layout API to output in markdown format.
Build “chat with your document” with semantic chunking
- This cookbook shows a simple demo for RAG pattern with Azure AI Document Intelligence as document loader and Azure Search as retriever in Langchain.
- This solution accelerator demonstrates an end-to-end baseline RAG pattern sample that uses Azure AI Search as a retriever and Azure AI Document Intelligence for document loading and semantic chunking.