In today’s digital era, where data is the new gold, efficiently extracting and processing information from complex documents, including those with dynamic tables, is crucial for businesses. Microsoft’s Azure AI services offer robust solutions for tackling these challenges, especially through the Document Intelligence Layout model. In this blog post, we will explore how you can use markdown output to enhance the capabilities of Azure Document Intelligence Layout model and subsequently feed this refined data into Azure OpenAI service for comprehensive information extraction.
Understanding Azure Document Intelligence Layout Model
The Azure Document Intelligence Layout model is a powerful tool within the Azure AI ecosystem designed to understand and interpret the layout and structure of documents. It can analyze various elements, such as text, tables, and selection marks, making it an invaluable asset for processing complex documents. Especially extracting tables is a key requirement for processing documents containing large volumes of data typically formatted as tables. The Layout model extracts tables in the pageResults
section of the JSON output. Extracted table information includes the number of columns and rows, row span, and column span. Each cell with its bounding polygon is output along with information whether the area is recognized as a columnHeader
or not. The model supports extracting tables that are rotated. Each table cell contains the row and column index and bounding polygon coordinates. For the cell text, the model outputs the span
information containing the starting index (offset
). The model also outputs the length
within the top-level content that contains the full text from the document.
{
"tables": [
{
"rowCount": 9,
"columnCount": 4,
"cells": [
{
"kind": "columnHeader",
"rowIndex": 0,
"columnIndex": 0,
"columnSpan": 4,
"content": "(In millions, except earnings per share)",
"boundingRegions": [],
"spans": []
},
]
}
]
}
However this format can be difficult to use if you need to further harness this data by feeding it to Azure OpenAI, for large complex tables it may be quite verbose to be used inside a prompt. On the other hand if we use the plain text output the tables structure is getting lost.
Markdown as a Bridge
Markdown, a lightweight markup language with plain-text formatting syntax, can serve as an intermediary format to bridge the gap between raw document data and structured data analysis. By converting document layouts into markdown, we can simplify the process of structuring document information before feeding it into AI models for extraction.
Step-by-Step Guide to Extracting Information
1. Preparation of Documents: Start with gathering the documents you wish to analyze. These could be in various formats, such as PDFs, Word documents, or images.
2. Document Analysis with Azure Document Intelligence Layout Model: Utilize the Azure Document Intelligence Layout model to analyze the document structure. This model will identify and categorize different elements within your documents, such as paragraphs, tables, and headings.
3. Conversion to Markdown: The Layout API can output the extracted text in markdown format. Use the outputContentFormat=markdown
to specify the output format in markdown. The markdown content is output as part of the content
section.
"analyzeResult": {
"apiVersion": "2024-02-29-preview",
"modelId": "prebuilt-layout",
"contentFormat": "markdown",
"content": "# CONTOSO LTD...",
}
We can do the same in Document Intelligence Studio -> Layout Model-> Analyze Options.
4. Information Extraction with Azure AI: With the document information now structured in markdown, you can leverage various Azure AI services to extract specific information. This method signs when used with Azure OpenAI because when instructing the model to read the markdown tables as such in the prompt then you can easily and accurately query the information in the tables.
5. Post-Extraction Processing: After extraction, the data can be further processed or analyzed based on your business needs. This might involve aggregating data from multiple documents, performing data visualization, or integrating the extracted information into business workflows.
Advantages
The use of markdown as an intermediary format offers several advantages:
- Simplified Data Structure : Markdown simplifies the document’s layout, making it easier for AI models to process the information.
- Flexibility: Markdown is widely supported and can be easily converted into other formats or displayed on different platforms.
- Efficiency : This approach can handle documents with dynamic tables and varying layouts, reducing manual preprocessing work.
Conclusion
Azure Document Intelligence Layout model with markdown output presents a sophisticated approach to processing and extracting information from complex documents. Azure AI’s capabilities help businesses unlock valuable insights hidden within their documents, enhancing decision-making and operational efficiency. This process not only streamlines data extraction but also opens new avenues for automating and optimizing document-intensive workflows.