Extract values and line items from invoices with Form Recognizer
Authors: Cha Zhang, Anatoly Ponomarev, Ben Ufuk Tezcan, Neta Haiby
Invoice Automation is a key component for accounts payable processes. Companies often need to extract key value pairs such as ship to, bill to, total, invoice ID etc., and line items and details such as item name, item quantity, item price and more.
Processing payments is a tedious and complex process where invoices come in from various sources and in various formats. Unfortunately, companies do not have control of the generation process of these invoices, and there are significant variations among these invoices making automation very challenging. For example, extracting item details on invoices is one of the most complex problems as their structure differs and they can be displayed in various ways, see an example below.
A typical invoice automation solution includes three major steps: document digitization, data extraction and a human in the loop for manual reviews to increase the accuracy. Due to the complexity of invoice layouts, existing solutions either resort to a full manual process, or require extensive efforts to build many templates for processing the large variations of invoices which is not scalable.
Form Recognizer Invoice API
Recently there has been significant advances towards natural language processing (NLP), where researchers apply Transformer based language models such as BERT, pre-trained on billions of documents. These deep learning models achieve state-of-the-art results on many NLP problems, such as named entity recognition, question answering, text summarization, etc. Nevertheless, it is not straightforward to apply these models to invoice automation, for the following reasons:
- A large percentage of invoices are scanned, faxed, or captured with mobile cameras, hence optical character recognition (OCR) is required to extract the raw texts.
- Even with OCR, the extracted texts are not continuous. Concatenating the texts and directly applying Transformer-based models leads to poor accuracy, because special structure of the content is lost during the process.
In the latest release of Form Recognizer, we offer a state-of-the-art, pre-built invoice extraction API with groundbreaking deep learning technology that combines both text and structure/visual information of the input documents. Similar to BERT, the model is pre-trained on a large quantity of documents. However, while BERT leverages only the text information, our pretraining also includes structure and visual information. After pre-training, the model is fine-tuned with carefully labeled invoices to achieve high accuracy on all fields and line items. Figure below shows a selected set of benchmark results on an internal invoice data set that contain 460 invoices never seen during training.
Easy and simple to use
Get stared out with the Form Recognizer Sample Tool Extracting invoices from documents.
It is also as simple as 2 API calls, no training, no preprocessing, or anything else needed. Just call the Analyze Invoice operation with your document (image, TIFF, or PDF file) as the input and extract the text, tables, invoice key value pairs and line items from your invoices. Form Recognizer invoices supports multipage PDFs and Tiff files, JPG, PNG and BMP file formats.
Form Recognizer Analyze Invoice API
Step 1: The Analyze Layout Operation –
https://{endpoint}/formrecognizer/v2.1/prebuilt/invoice/analyze[?includeTextDetails][&locale][&pages]
The Analyze Invoice call returns a response header field called Operation-Location. The Operation-Location value is a URL that contains the Result ID to be used in the next step.
Operation location -
https://cognitiveservice/formrecognizer/v2.1-preview.3/prebuilt/invoice/analyzeResults/44a436324-fc4b-4387-aa06-090cfbf0064f
Once you have the operation location call the Get Analyze Invoice Result operation. This operation takes as input the Result ID that was created by the Analyze Invoice operation. It returns a JSON response that contains a status field with the following possible values.
Step 2: The Get Analyze Layout Result Operation –
https://{endpoint}/formrecognizer/v2.1/prebuilt/invoice/analyzeResults/{resultId}
The output of the Get Analyze Invoice Results will provide a JSON output with the extracted invoice fields.
For example:
Get Started
- Create a Computer Vision resource in Azure.
- Try it out in the Form Recognizer Sample Tool UX follow the QuickStart
- Follow our SDK and REST API QuickStarts.
- Learn more about Form Recognizer Invoices and Form Recognizer.
- Write to us at formrecog_contact@microsoft.com