Form Recognizer is an Applied AI service that provides pre-built or custom models to extract information from forms and documents. The Form Recognizer service continues to innovate by expanding document types and AI quality to enable you to maximize your use of AI to process documents at scale.
Form Recognizer continues to improve product capabilities with improved models, support for additional document types and containerized solutions that run in the cloud or on premises either connected or fully disconnected for scenarios where containers need to run in an isolated environment. Recent updates to pricing include commitment tiers for customers who have a predictable volume of documents. Starting February 15th, the pricing for Invoices and General Document API will drop to $10 per 1000 pages, an 80% reduction, making it possible for customers to use invoices and the general document APIs for high volume scenarios to significantly lower cost while providing additional value.
Organizations today deal with vast quantities of unstructured documents including contracts, financial or medical reports and publications. Processing these unstructured documents with AI to extract the right fields by relying on semantics improves decision making and time to value.
Neural (Custom document) model is a new deep learned model to extract fields from structured and unstructured documents. The new model shares the same labeling approach as the existing custom form or template models. You start with just 5 labeled documents to train a model. With a common labeling format, it’s easy to take your existing template or custom form project and train a neural or custom document model. When dealing with variations, simply add a few samples of each variation to the training dataset as custom document models generalize well across variations.
Get started with neural models today:
POST https://{{Service}}.cognitiveservices.azure.com/formrecognizer/documentModels:build?api-version=2022-01-30-preview
{
"modelId": "model-name",
"description": "Trained via the rest API",
"buildMode": "neural",
"azureBlobSource": {
"containerUrl": "{{SAS Token to container}}",
"prefix": "{path to training dataset within container}/"
},
"tags": {
"createdBy": "rest-api"
}
}
POST https://{{Service}}.cognitiveservices.azure.com/formrecognizer/documentModels/{{model_id}}:analyze?api-version=2022-01-30-preview
{
"urlSource": "SAS URL to document"
}
Form Recognizer language specific SDKs offer developers an easy and efficient way to integrate Form Recognizer capabilities into native applications. Here's a sample to train a custom model and analyze a document with a custom model using the C# SDK. See the Python, Java and JavaScript for similar examples.
For scenarios where you have fields that can be extracted by a pre-trained model, the general document model can extract key value pairs or fields from a form or document with no training needed. The general document model has several updates in the latest release including support for check boxes or selection marks and improvements to key value pair detection.
Try out the general Document model in the Form Recognizer Studio with either a sample document or test the mode on one of your documents. The reduction in price for the general document API makes it ideal for a wide variety of documents!
Get started with general document model today:
The new v3 API makes it easy to try out the different prebuilt models by simply swapping out the model ID in the URL. To test the General Document model,
POST https://{{Service}}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-document:analyze?api-version=2022-01-30-preview
{
"urlSource": "SAS URL to document"
}
As tax season approaches in the United States, the new W-2 tax form model enables tax processing and other income verification scenarios. The new prebuilt W-2 model makes processing a W-2 form as simple as calling an API. The W-2 model can handle the different variations in formats to accurately extract the form fields from each document.
Get started with the W-2 model today:
POST https://{{Service}}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-tax.us.w2:analyze?api-version=2022-01-30-preview
{
"urlSource": "SAS URL to document"
}
Form Recognizer now supports Spanish language invoices! The invoice prebuilt enables a number of common procurement scenarios, with Spanish language invoices, this is now extended to other geographies and scenarios. The invoice prebuilt now recognizes additional fields including:
• CustomerTaxId
• VendorTaxId
• PaymentTerms
• TotalVAT
• Line/VAT
Get started with the invoices model today:
POST https://{{Service}}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-invoice:analyze?locale=es&api-version=2022-01-30-preview
{
"urlSource": "SAS URL to document"
}
The latest update to Form Recognizer v3.0 preview adds the new Read API. Read extracts text lines, words, their locations, detected languages, and handwritten style if detected from documents and images. Language detection is at the text line level. Read will output the language code with the highest internal confidence score for the extracted text lines. To learn more, please refer to the Read article.
POST https://{{Service}}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-read:analyze?api-version=2022-01-30-preview
{
"urlSource": "SAS URL to document"
}
Form Recognizer Read, Layout, and Custom Form add support for 42 new languages including Arabic, Hindi, and other languages using Arabic and Devanagari scripts to expand the coverage to 164 languages. Handwritten support for the same features expands to Japanese and Korean in addition to English, Chinese Simplified, French, German, Italian, Portuguese, and Spanish languages. Please refer to the language support article to see the full list.
Form Recognizer continues to improve AI quality and service performance. If you have any questions or feedback on either the preview APIs or the service, please contact us via email.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.