In our benchmarks, adding just a few labeled examples improved field extraction F1-Scores by up to 40%. Here's how to get similar results with Azure Content Understanding.
The Promise (and Pain) of Automated Document Processing
Every enterprise runs on documents. Invoices, contracts, tax forms, medical records, insurance claims…the list never ends. Intelligent Document Processing (IDP) promises to eliminate the manual work of reading, interpreting, and entering data from these documents. But anyone who has tried to automate document workflows at scale knows the reality: it works great — until it doesn't.
The problem isn't the AI. It's the documents. Layouts shift between vendors. Field names change across regions. A "Total Amount" on one invoice is an "Amount Due" on another. Even within a single organization, the same document type can arrive in dozens of variations.
These inconsistencies cause missed fields, incorrect extractions, and broken downstream workflows — often requiring more effort to fix than the manual process they replaced.
So how do you close the gap between "mostly works" and true straight-through processing?
Meet Azure Content Understanding
Azure Content Understanding (CU) is a generative AI service within Microsoft Foundry that transforms unstructured content into clean, structured, and actionable data. CU extends the document extraction capabilities of Azure Document Intelligence to unstructured and hybrid document types, as well as multimodal content including images, audio, and video.
Here’s how it works:
- Schema definition. You define the fields that matter to your business – e.g., “Invoice Number,” “Line-Item Total,” “Patient Name” – and describe each one in natural language.
- Field extraction. CU’s analyzer finds and extracts those fields from your documents, even when layouts vary.
- Confidence scores. Every extracted value comes with a confidence rating, so you know exactly when to trust the output and when to flag it for review.
- Grounding with citations. CU cites the source location for each extracted field, so every answer is traceable back to the original document.
- Multimodal support. CU processes text, PDFs, Word documents, images, audio, and video, all through a single, unified platform.
Because CU is built on generative AI, it handles format variations far better than traditional rule-based or template-matching systems. But even generative AI can struggle with ambiguous layouts, unusual field arrangements, or domain-specific terminology.
That's where labeled examples come in.
Your Toolkit for Extraction Quality
CU provides a layered set of tools to improve extraction quality. Start simple and add more guidance only where needed:
- Field names. Clear, descriptive field names help CU understand what to look for. “applicant_date_of_birth” is far more effective than “field_3.”
- Field descriptions. Plain-language descriptions give additional context. For example: “The applicant’s age, calculated from their date of birth and the application signature date” is much more effective than just “Age.”
- Labeled examples. When names and descriptions aren’t enough – because the layout is complex, the field is ambiguous, or the document structure is unusual – you can provide labeled examples as a knowledge source. CU’s analyzer uses these examples at inference time to improve accuracy. Note that labeled examples add input tokens and processing cost, so start with schema improvements and add examples only where they make a measurable difference.
Think of it like onboarding a new team member – sometimes a well-written guidebook is enough, but for tricky cases, you need to walk them through a real example.
When Do Labeled Examples Help Most?
Not every field need labeled examples. CU handles many fields well with just names and descriptions. Examples are most valuable in two scenarios:
Scenario 1: Multiple Document Templates
Your business processes the same logical document type – say, receipts – but they arrive in wildly different formats. A restaurant receipt has a simple, tabular layout. A food delivery receipt has detailed line-item breakdowns with fees and tips. An email confirmation for an online subscription looks nothing like either one.
Example Contoso receipts of very different formats and field schema.
A single schema definition can’t capture all these structural variations. By labeling a few representative documents from each template, you teach the analyzer how the same fields appear across different layouts.
Scenario 2: Complex or Ambiguous Fields
Some fields are genuinely hard to describe in words:
- A tax form where the “Taxable Amount” sits in a dense grid of similar-looking numbers, and the analyzer can’t determine which value to extract.
- Fields with domain-specific meaning that a plain-language description can’t fully convey.
In these cases, a labeled example is worth a thousand words of description.
Measuring the Impact: Benchmark Results
We evaluated CU’s analyzer across 45 documents spanning five real-world categories: tax forms, ethics review documents, legal documents, medical documents, and employment applications.
Accuracy Improvements
Without labeled examples (zero-shot), the analyzer struggled with specific fields in each category – particularly those with ambiguous positioning or complex layouts. After adding labeled examples, F1-Scores improved across the board.
F1-Scores compare extraction accuracy with and without labeled examples across five document categories. Dark blue bars show scores with labeled samples; light blue bars show scores without.
Walkthrough: Improving Ambiguous Extractions with Labeled Examples
Let's look at a concrete scenario.
The Problem
In zero-shot mode, CU was asked to extract the “Taxable Amount” from the IRA distributions section of a 1040 tax form. The form has multiple numeric fields in close proximity: gross distributions, taxable amounts, and various line items in a similar visual format. Without guidance, the analyzer returned a low-confidence, empty result.
The Improvement
We labeled one document by highlighting the correct “Taxable Amount” field and confirmed its value. The analyzer began extracting the correct value with higher confidence – for both the test document and other similar tax forms.
|
|
|
{ "Taxable_Amount": { "type": "string", "confidence": 0.6 } |
Before adding any samples.
Add one sample.
|
|
|
{ "Taxable_Amount": { "type": "string", "valueString": "$35,118 1", "spans": [ { "offset": 6794, "length": 7 }, { "offset": 6811, "length": 1 } ], "confidence": 0.998, "source": "D(3,7.0763,1.5199,7.4701,1.5206,7.4701,1.6242,7.0761,1.6230);D(3,7.7150,1.5251,7.7727,1.5250,7.7727, 1.6138,7.7150,1.6160)" } } |
After adding the sample.
What Labeled Examples Can (and Can’t) Do
Labeled Examples Help With
- Resolving layout ambiguity. Showing the analyzer which value to extract when multiple candidates are visually similar.
- Handling multiple templates. Demonstrating how the same field appears across different document formats.
- Complex field patterns. Guiding extraction for fields with unusual positioning or structure.
Start with Field Descriptions When
- The field's meaning is ambiguous. If the analyzer extracts from the wrong source, first try improving your description. For example, changing “Age” to “The applicant’s age, calculated from their date of birth and the application signature date” can resolve the issue.
- The field requires calculation or inference. For a field like “unit price” where the document only shows quantity and total, a precise description – e.g. “The price for a single item, calculated by dividing the total by the quantity” – tells the analyzer what to compute.
When description alone doesn’t address the issue, consider adding labeled samples.
Labeled Examples Cannot Fix
- Upstream OCR errors. If poor image quality causes character recognition errors (e.g., "$50.00" read as "5 50.00"), no number of examples will fix it. Address image quality and OCR configuration at the source.
Best Practices
Based on our benchmarks and real-world deployments, here are the practices that deliver the best results:
1. Start with Your Schema
Before adding any labeled examples, ensure field names are clear and descriptions are specific. Many extraction issues can be resolved at this stage.
2. Identify Your Templates
Group documents by structural similarity – same layout, field positions, and terminology. Aim for at least one labeled example per template.
3. Choose Representative Examples
Pick documents typical of their template, not edge cases, to show the common patterns.
4. Label Your Data
Use the built-in labeling experience in CU Studio to annotate your examples. The Studio auto-populates field labels and lets you confirm or edit values and tags. From there, use CU Studio, API, or SDK to create and run custom analyzers with your labeled data.
5. Monitor Confidence Scores
After adding examples, review confidence scores on your test documents. Both accuracy and confidence should improve. Use confidence thresholds to route documents between auto-processing and human review.
6. Iterate Incrementally
Start with a small number of examples. Add more only where accuracy or confidence remains below your threshold.
Conclusion
Automated document processing doesn’t have to mean “automated except for the hard parts.” With Azure Content Understanding, you can close the accuracy gap on complex, variable documents – not by retraining models or writing rules, but by providing well-chosen labeled examples.
Our benchmarks show meaningful F1-Score improvements across document categories. That translates to fewer human reviews, faster processing, and more reliable downstream data.
Start today! Azure Content Understanding is GA and available in Microsoft Foundry. To learn more, visit the documentation or try it in Microsoft Foundry