In our previous blog post: Document Generative AI: the Power of Azure AI Document Intelligence & Azure OpenAI Service Combined, we introduced what Document Generative AI is and how you can use Azure AI Document Intelligence (formerly known as Azure Form Recognizer) and Azure OpenAI service to enable chat on a variety of enterprise long documents.
Retrieval Augmented Generation (RAG) is a design pattern that is commonly used in Document Generative AI (for an example, see the repo here). It is an architecture that augments the capabilities of a Large Language Model (LLM) like ChatGPT by adding an information retrieval system that provides the data. Adding an information retrieval system gives you control over the data used by an LLM when it formulates a response. Enterprise documents are usually long and complex, though LLM can take in more context recently, a good chunking strategy is still required to divide them into smaller pieces that can be more efficient in storage and retrieval, as well as enhancing the relevance and interpretability of the results. However, most chunking strategy in RAG today is still based on text length without much consideration on document structure. There’s a high demand for semantic chunking – so how do you divide a large body of texts or documents into smaller, meaningful chunks based on semantic content rather than arbitrary splits?
The Azure AI Document Intelligence Layout model offers a comprehensive solution for semantic chunking by providing advanced content extraction and document structure analysis capabilities. With this model, you can easily extract paragraphs, tables, titles, section headings, selection marks, font/style, key-value pairs, math formulas, QR code/barcode and more from various document types. The extracted information can be conveniently outputted to markdown format, enabling you to define your semantic chunking strategy based on the provided building blocks.
Figure 1 Layout model can detect document structures and output to markdown.
Figure 2 Layout model can extract tables from your document.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.