Mistral Document AI (available as a serverless model in Azure AI Foundry) brings multimodal, layout‑aware document understanding directly into a developer workflow without provisioning GPUs. Instead of receiving only raw text from Optical Character Recognition (OCR), you obtain markdown plus optional structured annotations that preserve tables, headings, figures, and multilingual content. This structural preservation enables more accurate and reliable downstream automation for AI agents, RAG workflows, and compliance tasks.
Mistral Document AI
Mistral Document AI combines vision + language understanding to interpret complex page layouts. Compared to traditional OCR approaches, it maintains structural semantics: tables remain tables, figures are preserved, headings stay delineated, and mathematical or LaTeX elements are not flattened into ambiguous text. The model supports 25+ languages and handles PDFs and images (including scanned, mixed‑content pages) with high accuracy. Published benchmarks indicate superior layout fidelity and text extraction accuracy versus several large general models and legacy OCR engines
Key capabilities:
- Robust layout understanding. Detects document regions (headings, paragraphs, lists, tables, and figures) and preserves their relationships so downstream code can treat them as structured objects instead of flat text.
- Preserves document structure and returns markdown plus optional JSON annotations (bounding boxes, fields, or custom schema).
- Broad language and style support. Accurately extracts text across many languages and common font/handwriting variations, improving reliability on scanned, photocopied, or noisy documents.
- Doc‑as‑prompt support. For Agent systems that require clean, structured output, you can pass the extracted document content directly to tool invocations, RAG pipelines or agents for extraction, classification, or summarization.
- Serverless, low‑latency processing designed for single‑document and batch workflows.
Assume you own a collection of old handwritten or scanned family recipe PDFs and images passed down through generations. You want to digitize these recipes into a structured format for easy access, sharing across relatives and meal planner/ shopping list generation tools.
Your pipeline may look like this:
- Ingest document (image or PDF) → call Mistral Document AI → receive markdown/json pages.
- Parse markdown to extract key information: recipe title, ingredients, cooking instructions, cooking time etc.
- Derive a shopping list.
- Feed structured output into downstream agent (e.g, Suggest ingredient substitutions or Generate weekly shopping consolidation).
How to use it (TypeScript)
Pre-requisites
- An Azure AI Foundry project
- mistral-document-ai-2505 model deployment
Set up your endpoint and API_key in your environment variables. The full sample will be referenced at the end of this blog, but here’s a minimal explanation of the core logic.
Demo scenario: Recipe PDF/ Image → Structured Data → Shopping List
The code snippets below illustrate the key steps to process a recipe PDF using Mistral Document AI and extract structured data.
- First, encode the PDF to base64, as document URL or image URL is not supported directly.
- Then invoke the Mistral Document AI endpoint with the appropriate payload.
- Finally, save and read the output JSON for post-processing.
// Step 1
function encodePdfToBase64(pdfPath: string): string
{ const pdfContent = fs.readFileSync(pdfPath);
return pdfContent.toString('base64');
}
const base64Content = encodePdfToBase64(pdfPath);
// Step 2
const headers = { 'Content-Type': 'application/json', Authorization: `Bearer ${apiKey}` };
const payload = {
model: modelName, document:
{ type: 'document_url', document_url: `data:application/pdf;base64,${base64Content}` },
include_image_base64: true,
};
const response = await axios.post<DocumentAIResponse>(azureEndpoint!, payload, { headers });
// Step 3
const outputFile = 'document_ai_result.json';
fs.writeFileSync(outputFile, JSON.stringify(result, null, 2), 'utf-8');
const jsonContent = fs.readFileSync(outputFile, 'utf-8');
Expected output:
After OCR, we then proceed to extract structured recipe information (Ingredients, cooking steps, cooking time) and perform additional post-processing like normalization, for the shopping list.
With your data in structured JSON, you can now build applications or agents that leverage this knowledge. Here is an example of a simple app (Web Components) that implements the above end-to-end flow, allowing users to upload recipe documents and see extracted structured data and shopping lists.
Security, Privacy & Compliance
Running Mistral Document AI inside Azure AI Foundry provides:
- Regional data residency as documents are processed inside your selected Azure region. Data is not forwarded to external third‑party endpoints beyond Azure’s managed hosting.
- Enterprise isolation and governance alignment with standard Azure controls.
- Ability to apply Responsible AI tooling (content filtering, monitoring) to downstream agent operations.
- Content safety filters applied to document annotation outputs. Note that OCR output itself does not have content safety enforcement by default.
- Selective self‑hosting for highly sensitive workloads, this way your data remains fully within your controlled environment.
Mistral Document AI enables a higher‑quality foundation for document‑driven AI features: better fidelity, multilingual support, structured outputs, and integration into Azure AI operational tooling. By preserving layout and enabling structured extractions, it reduces custom parsing overhead and accelerates agent and RAG development. The family recipe preservation scenario illustrates how unstructured cultural artifacts become actionable digital knowledge.