Blog Post

AI - Azure AI services Blog
4 MIN READ

Announcing the General Availability of Document Intelligence v4.0 API

Vinod_Kurpad's avatar
Vinod_Kurpad
Icon for Microsoft rankMicrosoft
Dec 17, 2024

Document Intelligence v4.0 API with enhanced support for RAG including document structure, tables and figures in a markdown output, an improved OCR model with searchable PDF output and new and improved field extraction models for prebuilt and custom document types.

The Document Intelligence v4.0 API is now generally available! This latest version of Document Intelligence API brings new and updated capabilities across the entire product including updates to Read and Layout APIs for content extraction, prebuilt and custom extraction models for schema extraction from documents and classification models.  Document Intelligence has all the tools to enable RAG and document automation solutions for structured and unstructured documents.

Enhanced Layout capabilities

Layout output contains markdown structure and figures

This release brings significant updates to our Layout capabilities, making it the default choice for document ingestion with enhanced support for Retrieval-Augmented Generation (RAG) workflows.

The Layout API now offers a markdown output format that provides a better representation of document elements such as headers, footers, sections, section headers and tables when working with Gen AI models. This structured output enables semantic chunking of content, making it easier to ingest documents into RAG workflows and generate more accurate results. Try Layout in the Document Intelligence Studio or use Layout as a skill in your RAG pipelines with Azure Search.

Searchable PDF output

Generate a searchable PDF

Document Intelligence no longer outputs only JSON! With the 4.0 release, you can now generate a searchable PDF output from an input document. The recognized text is overlaid over the scanned text, making all the content in the documents instantly searchable. This feature enhances the accessibility and usability of your documents, allowing for quick and efficient information retrieval. Try the new searchable PDF output in the Studio or learn more.

Searchable PDF is available as an output from the Read API at no additional cost. This release also includes several updates to the OCR model to better handle complex text recognition challenges.

New and updated Prebuilt models

New prebuilt model for paystubs

Prebuilt models offer a simple API to extract a defined schema from known document types. The v4.0 release adds new prebuilt models for mortgage processing, bank document processing, paystub, credit/debit card, check, marriage certificate, and prebuilt models for processing variants of the 1095, W4, and 1099 tax forms for US tax processing scenarios. These models are ideal for extracting specific details from documents like bank statements, checks, paystubs, and various tax forms. With over 22 prebuilt model types, Document Intelligence has models for common documents in  procurement,  tax, mortgage and financial services. See models overview for a complete list of document types supported with prebuilt models.

Query field add-on capability

Query field is an add-on capability to extend the schema extracted from any prebuilt model. This add-on capability is ideal when you have simple fields that need to be extracted. Query field also work with Layout, so for simple documents, you don’t need to train a custom model and can just define the query fields to begin processing the document with no training.  Query field supports a maximum of 20 fields per request. Try query field in the Document Intelligence Studio with Layout or any prebuilt model.

Document classification model

The custom classification models are updated to improve the classification process and now support multi-language documents and incremental training. This allows you to update the classifier model with additional samples or classes without needing the entire training dataset. Classifiers also support analyzing Office document types (.docx, .pptx, and .xls). Version 4.0 adds a classifier copy operation for copying your classifier across resources, regions or subscriptions making model management easier. This version also introduces some changes in the splitting behavior, by default, the custom classification model no longer splits documents during analysis. Learn more about the classification and splitting capabilities.

Improvements to Custom Extraction models

Custom extraction models now output confidence scores for tables, table rows, and cells. This makes the process of validating model results much easier and provides the tools to trigger human reviews. Custom model capabilities have also improved with the addition of signature detection to neural models and support for overlapping fields. Neural models now include a paid training tier for when you have a large dataset of labeled documents to train. Paid training enables longer training to ensure you have a model that performs better on the different variations in your training dataset. Learn more about improvements to custom extraction models.

New implementation of model compose for greater flexibility

With custom extraction models in the past, you could compose multiple models into a single composed model. When a document was analyzed with a composed model, the service picked the model best suited to process the document. With this version, the model compose introduces a new implementation requiring a classification model in addition to the extraction models. This enables processing multiple instances of the same document with splitting, conditional routing and more. Learn more about the new model compose implementation.

Get started with the v4.0 API today

The Document Intelligence v4.0 API is packed with many more updates. Start with the what’s new page to learn more. You can try all of the new and updated capabilities in the Document Intelligence Studio.   Explore the new REST API or the language specific SDKs to start building our updating your document workflows.

 

Updated Dec 17, 2024
Version 1.0
  • yyan's avatar
    yyan
    Copper Contributor

    Questions:

    1. How can we generate json and searchable PDF output in one call?
    2. What's the easiest way to generate hOCR file?
    3. For the Layout model, can we generate both json and markdown output?
    4. When will searchable PDF output also available for Layout model?