Extract values and line items from invoices with Form Recognizer now generally available

NetaH

Microsoft

Jun 03, 2021

Extract values and line items from invoices with Form Recognizer

Authors: Cha Zhang, Anatoly Ponomarev, Ben Ufuk Tezcan, Neta Haiby

Invoice Automation is a key component for accounts payable processes. Companies often need to extract key value pairs such as ship to, bill to, total, invoice ID etc., and line items and details such as item name, item quantity, item price and more.

Processing payments is a tedious and complex process where invoices come in from various sources and in various formats. Unfortunately, companies do not have control of the generation process of these invoices, and there are significant variations among these invoices making automation very challenging. For example, extracting item details on invoices is one of the most complex problems as their structure differs and they can be displayed in various ways, see an example below.

A typical invoice automation solution includes three major steps: document digitization, data extraction and a human in the loop for manual reviews to increase the accuracy. Due to the complexity of invoice layouts, existing solutions either resort to a full manual process, or require extensive efforts to build many templates for processing the large variations of invoices which is not scalable.

Form Recognizer Invoice API

Recently there has been significant advances towards natural language processing (NLP), where researchers apply Transformer based language models such as BERT, pre-trained on billions of documents. These deep learning models achieve state-of-the-art results on many NLP problems, such as named entity recognition, question answering, text summarization, etc. Nevertheless, it is not straightforward to apply these models to invoice automation, for the following reasons:

A large percentage of invoices are scanned, faxed, or captured with mobile cameras, hence optical character recognition (OCR) is required to extract the raw texts.
Even with OCR, the extracted texts are not continuous. Concatenating the texts and directly applying Transformer-based models leads to poor accuracy, because special structure of the content is lost during the process.

In the latest release of Form Recognizer, we offer a state-of-the-art, pre-built invoice extraction API with groundbreaking deep learning technology that combines both text and structure/visual information of the input documents. Similar to BERT, the model is pre-trained on a large quantity of documents. However, while BERT leverages only the text information, our pretraining also includes structure and visual information. After pre-training, the model is fine-tuned with carefully labeled invoices to achieve high accuracy on all fields and line items. Figure below shows a selected set of benchmark results on an internal invoice data set that contain 460 invoices never seen during training.

Easy and simple to use

Get stared out with the Form Recognizer Sample Tool Extracting invoices from documents.

It is also as simple as 2 API calls, no training, no preprocessing, or anything else needed. Just call the Analyze Invoice operation with your document (image, TIFF, or PDF file) as the input and extract the text, tables, invoice key value pairs and line items from your invoices. Form Recognizer invoices supports multipage PDFs and Tiff files, JPG, PNG and BMP file formats.

Form Recognizer Analyze Invoice API

Step 1: The Analyze Layout Operation –

https://{endpoint}/formrecognizer/v2.1/prebuilt/invoice/analyze[?includeTextDetails][&locale][&pages]

The Analyze Invoice call returns a response header field called Operation-Location. The Operation-Location value is a URL that contains the Result ID to be used in the next step.

Operation location -
https://cognitiveservice/formrecognizer/v2.1-preview.3/prebuilt/invoice/analyzeResults/44a436324-fc4b-4387-aa06-090cfbf0064f

Once you have the operation location call the Get Analyze Invoice Result operation. This operation takes as input the Result ID that was created by the Analyze Invoice operation. It returns a JSON response that contains a status field with the following possible values.

Step 2: The Get Analyze Layout Result Operation –

https://{endpoint}/formrecognizer/v2.1/prebuilt/invoice/analyzeResults/{resultId}

The output of the Get Analyze Invoice Results will provide a JSON output with the extracted invoice fields.

For example: