microsoft foundry

46 Topics

Stop Drawing Architecture Diagrams Manually: Meet the Open-Source AI Architecture Review Agents
Designing and documenting software architecture is often a battle against static diagrams that become outdated the moment they are drawn. The Architecture Review Agent changes that by turning your design process into a dynamic, AI-powered workflow. In this post, we explore how to leverage Microsoft Foundry Hosted Agents, Azure OpenAI, and Excalidraw to build an open-source tool that instantly converts messy text descriptions, YAML, or README files into editable architecture diagrams. Beyond just drawing boxes, the agent acts as a technical co-pilot, delivering prioritized risk assessments, highlighting single points of failure, and mapping component dependencies. Discover how to eliminate manual diagramming, catch security flaws early, and deploy your own enterprise-grade agent with zero infrastructure overhead.
ShivamGoyal
Mar 11, 2026 Place Educator Developer Blog
5.9KViews
5likes
5Comments
Announcing Fireworks AI on Microsoft Foundry
We’re excited to announce that starting today, Microsoft Foundry customers can access high performance, low latency inference performance of popular open models hosted on the Fireworks cloud from their Foundry projects, and even deploy their own customized versions, too! As part of the Public Preview launch, we’re offering the most popular open models for serverless inference in both pay-per-token (US Data Zone) and provisioned throughput (Global Provisioned Managed) deployments. This includes: Minimax M2.5 🆕 OpenAI’s gpt-oss-120b MoonshotAI’s Kimi-K2.5 DeepSeek-v3.2 For customers that have been looking for a path to production with models they’ve post-trained, you can now import your own fine-tuned versions of popular open models and deploy them at production scale with Fireworks AI on Microsoft Foundry. Serverless (pay-per-token) For customers wanting per-token pricing, we’re launching with Data Zone Standard in the United States. You can make model deployments for Foundry resources in the following regions: East US East US 2 Central US North Central US West US West US 3 Depending on your Azure subscription type, you’ll automatically receive either a 250K or 25K tokens per minute (TPM) quota limit per region and model. (Azure Student and Trial subscriptions will not receive quota at this time.) Per-token pricing rates include input, cached input, and output tokens priced per million tokens. Model Input Tokens ($/1M tokens) Cached Tokens ($/1M tokens) Output Tokens ($/1M tokens) gpt-oss-120b $0.17 $0.09 $0.66 kimi-k2.5 $0.66 $0.11 $3.30 deepseek-v3.2 $0.62 $0.31 $1.85 minimax-m2.5 $0.33 $0.03 $1.32 As we work together with Fireworks to launch the latest OSS models, the supported models will evolve as popular research labs push the frontier! Provisioned Throughput For customers looking to shift or scale production workloads on these models, we’re launching with support for Global provisioned throughput. (Data Zone support will be coming soon!) Provisioned throughput for Fireworks models works just like it does for Foundry models: PTUs are designed to deliver consistent performance in terms of time between token latency. Your existing quota for Global PTUs works as does any reservation commitments! gpt-oss-120b Kimi-K2.5 DeepSeek-v3.2 MiniMax-M2.5 Global provisioned minimum deployment 80 800 1,200 400 Global provisioned scale increment 40 400 600 200 Input TPM per PTU 13,500 530 1,500 3,000 Latency Target Value 99% > 50 Tokens Per Second^ 99% > 50 Tokens Per Second^ 99% > 50 Tokens Per Second^ 99% > 50 Tokens Per Second^ ^ Calculated as p50 request latency on a per 5 minute basis. Custom Models Have you post-trained a model like gpt-oss-120b for your particular use case? With Fireworks on Foundry you can deploy, govern, and scale your custom models all within your Foundry project. This means full fine-tuned versions of models from the following families can be imported and deployed as part of preview: Qwen3-14B OpenAI gpt-oss-120b Kimi K2 and K2.5 DeepSeek v3.1 and v3.2 The new Custom Models page in the Models experience lets you initiate the import process for copying your model weights into your Foundry project. > Models -> Custom Models. For performing a high-speed transfer of the files into Foundry, we’ve added a new feature to Azure Developer CLI (azd) for facilitating the transfer of a directory of model weights. The Foundry UI will give you cli arguments to copy and paste for quickly running azd ai models create pointed to your Foundry project. Enabling Fireworks AI on Microsoft Foundry in your Subscription While in preview, customers must opt-in to integrate their Microsoft Foundry resources with the Fireworks inference cloud to perform model deployments and send inference requests. Opt-in is self-service and available in the Preview features panel within your Azure portal. For additional details on finding and enabling the preview feature, please see the new product documentation for Fireworks on Foundry. Frequently Asked Questions How are Fireworks AI on Microsoft Foundry models different than Foundry Models? Models provided direct from Azure include some open-source models as well as proprietary models from labs like Black Forest Labs, Cohere, and xAI, and others. These models undergo rigorous model safety and risks assessments based on Microsoft’s Responsible AI standard. For customers needing the latest open-source models from emerging frontier labs, break-neck speed, or the ability to deploy their own post-trained custom models, Fireworks delivers best-in-class inference performance. Whether you’re focused on minimizing latency or just staying ahead of the trends, Fireworks AI on Microsoft Foundry gives you additional choice in the model catalog. Still need to quantify model safety and risk? Foundry provides a suite of observability tools with built-in risk and safety evaluators, letting you build AI systems confidently. How is model retirement handled? Customers using serverless per-token offers of models via Fireworks on Foundry will receive notice no less than 30 days before potential model retirement. You’ll be recommended to upgrade to either an equivalent, longer-term supported Azure Direct model or a newer model provided by Fireworks. For customers looking to use models beyond the retirement period, they may do so via Provisioned throughput deployments. How can I get more quota? For TPM quota, you may submit requests via our current Fireworks on Foundry quota form. For PTU quota, please contact your Microsoft account team. Can you support my custom model? Let’s talk! In general, if your model meets Fireworks’ current requirements, we have you covered. You can either reach out to your Microsoft account team or your contacts you may already have with Fireworks.
davevoutila
Mar 11, 2026 Place Microsoft Foundry Blog
1.2KViews
1like
0Comments
Introducing GPT-5.4 in Microsoft Foundry
Today, we’re thrilled to announce that OpenAI’s GPT‑5.4 is now generally available in Microsoft Foundry: a model designed to help organizations move from planning work to reliably completing it in production environments. As AI agents are applied to longer, more complex workflows; consistency and follow‑through become as important as raw intelligence. GPT‑5.4 combines stronger reasoning with built in computer use capabilities to support automation scenarios, and dependable execution across tools, files, and multi‑step workflows at scale. GPT-5.4: Enhanced Reliability in Production AI GPT-5.4 is designed for organizations operating AI in real production environments, where consistency, instruction adherence, and sustained context are critical to success. The model brings together advances in reasoning, coding, and agentic workflows to help AI systems not only plan tasks but complete them with fewer interruptions and reduced manual oversight. Compared with earlier generations, GPT-5.4 emphasizes stability across longer interactions, enabling teams to deploy agentic AI with greater confidence in day-to-day production use. GPT-5.4 introduces advancements that aim for production grade AI: More consistent reasoning over time, helping maintain intent across multi‑turn and multi‑step interactions Enhanced instruction alignment to reduce prompt tuning and oversight Latency improved performance for responsive, real-time workflows Integrated computer use capabilities for structured orchestration of tools, file access, data extraction, guarded code execution, and agent handoffs More dependable tool invocation reducing prompt tuning and human oversight Higher‑quality generated artifacts, including documents, spreadsheets, and presentations with more consistent structure Together, these improvements support AI systems that behave more predictably as tasks grow in length and complexity. From capability to real-world outcomes GPT‑5.4 delivers practical value across a wide range of production scenarios where follow‑through and reliability are essential: Agent‑driven workflows, such as customer support, research assistance, and business process automation Enterprise knowledge work, including document drafting, data analysis, and presentation‑ready outputs Developer workflows, spanning code generation, refactoring, debugging support, and UI scaffolding Extended reasoning tasks, where logical consistency must be preserved across longer interactions Teams benefit from reduced task drift, fewer mid‑workflow failures, and more predictable outcomes when deploying GPT‑5.4 in production. GPT-5.4 Pro: Deeper analysis for complex decision workflows GPT‑5.4 Pro, a premium variant designed for scenarios where analytical depth and completeness are prioritized over latency. Additional capabilities include: Multi‑path reasoning evaluation, allowing alternative approaches to be explored before selecting a final response Greater analytical depth, supporting problems with trade‑offs or multiple valid solutions Improved stability across long reasoning chains, especially in sustained analytical tasks Enhanced decision support, where rigor and thoroughness outweigh speed considerations Organizations typically select GPT‑5.4 Pro when deeper analysis is required such as scientific research and complex problems, while GPT‑5.4 remains the right choice for workloads that prioritize reliable execution and agentic follow‑through. Microsoft Foundry: Enterprise‑Grade Control from Day One GPT‑5.4 and GPT‑5.4 Pro are available through Microsoft Foundry, which provides the operational controls organizations need to deploy AI responsibly in production environments. Foundry supports policy enforcement, monitoring, version management, and auditability, helping teams manage AI systems throughout their lifecycle. By deploying GPT‑5.4 through Microsoft Foundry, organizations can integrate advanced agentic capabilities into existing environments while aligning with security, compliance, and operational requirements from day one. Customer Spotlight Get Started with GPT-5.4 in Microsoft Foundry GPT‑5.4 sets a new bar for production‑ready AI by combining stronger reasoning with dependable execution. Through enterprise‑grade deployment in Microsoft Foundry, organizations can move beyond experimentation and confidently build AI systems that complete complex work at scale. Computer use capabilities will be introduced shortly after launch. GPT‑5.4 <272K input tokens context length in Microsoft Foundry is priced at $2.50 per million input tokens, $0.25 per million cached input tokens, and $15.00 per million output tokens. The GPT‑5.4 >272K input tokens context length in Microsoft Foundry is priced at $5.00 per million input tokens, $0.50 per million cached input tokens, and $22.50 per million output tokens. The GPT-5.4 is available at launch in Standard Global and Standard Data Zone (US), with additional deployment options coming soon. GPT‑5.4 Pro is priced at $30.00 per million input tokens, and $180.00 per million output tokens, and is available at launch in Standard Global. Build agents for real-world workloads. Start building with GPT‑5.4 in Microsoft Foundry today.
Naomi Moneypenny
Mar 10, 2026 Place Microsoft Foundry Blog
14KViews
3likes
2Comments
From Manual Document Processing to AI-Orchestrated Intelligence
Building an IDP Pipeline with Azure Durable Functions, DSPy, and Real-Time AI Reasoning The Problem Think about what happens when a loan application, an insurance claim, or a trade finance document arrives at an organisation. Someone opens it, reads it, manually types fields into a system, compares it against business rules, and escalates for approval. That process touches multiple people, takes hours or days, and the accuracy depends entirely on how carefully it's done. Organizations have tried to automate parts of this before — OCR tools, templated extraction, rule-based routing. But these approaches are brittle. They break when the document format changes, and they can't reason about what they're reading. The typical "solution" falls into one of two camps: Manual processing. Humans read, classify, and key in data. Accurate but slow, expensive, and impossible to scale. Single-model extraction. Throw an OCR/AI model at the document, trust the output, push to downstream systems. Fast but fragile — no validation, no human checkpoint, no confidence scoring. What's missing is the middle ground: an orchestrated, multi-model pipeline with built-in quality gates, real-time visibility, and the flexibility to handle any document type without rewriting code. That's what IDP Workflow is — a six-step AI-orchestrated pipeline that processes documents end to end, from a raw PDF to structured, validated data, with human oversight built in. This isn't automation replacing people. It's AI doing the heavy lifting and humans making the final call. Architecture at a Glance POST /api/idp/start → Step 1 PDF Extraction (Azure Document Intelligence → Markdown) → Step 2 Classification (DSPy ChainOfThought) → Step 3 Data Extraction (Azure Content Understanding + DSPy LLM, in parallel) → Step 4 Comparison (field-by-field diff) → Step 5 Human Review (HITL gate — approve / reject / edit) → Step 6 AI Reasoning Agent (validation, consolidation, recommendations) → Final structured result The backend is Azure Durable Functions (Python) on Flex Consumption — customers only pay for what they use, and it scales automatically. The frontend is a Next.js dashboard with SignalR real-time updates and a Reaflow workflow visualization. Every step broadcasts stepStarted → stepCompleted / stepFailed events so the UI updates as work progresses. The pattern applies wherever organisations receive high volumes of unstructured documents that need to be classified, data-extracted, validated, and approved. The Six Steps, Explained Step 1: PDF → Markdown We use Azure Document Intelligence with the prebuilt-layout model to convert uploaded PDFs into structured Markdown — preserving tables, headings, and reading order. Markdown turns out to be a much better intermediate representation for LLMs than raw text or HTML. class PDFMarkdownExtractor: async def extract(self, pdf_path: str) -> tuple[PDFContent, Step01Output]: poller = self.client.begin_analyze_document( "prebuilt-layout", analyze_request=AnalyzeDocumentRequest(url_source=pdf_path), output_content_format=DocumentContentFormat.MARKDOWN, ) result: AnalyzeResult = poller.result() # Split into per-page Markdown chunks... Output: Per-page Markdown content, total page count, and character stats. Step 2: Document Classification (DSPy) Rather than hard-coding classification rules, we use DSPy with ChainOfThought prompting. DSPy lets us define classification as a signature — a declarative input/output contract — and the framework handles prompt optimization. class DocumentClassificationSignature(dspy.Signature): """Classify document page into predefined categories.""" page_content: str = dspy.InputField(desc="Markdown content of the document page") available_categories: str = dspy.InputField(desc="Available categories") classification: DocumentClassificationOutput = dspy.OutputField() Categories are loaded from a domain-specific classification_categories.json. Adding new categories means editing a JSON file, not code. Critically, classification is per-page, not per-document. A multi-page loan application might contain a loan form on page 1, income verification on page 2, and a property valuation on page 3 — each classified independently with its own confidence score and detected field indicators. This means multi-section documents are handled correctly downstream. Why DSPy? It gives us structured, typed outputs via Pydantic models, automatic prompt optimization, and clean separation between the what (signature) and the how (ChainOfThought, Predict, etc.). Step 3: Dual-Model Extraction (Run in Parallel) This is where things get interesting. We run two independent extractors in parallel: Azure Content Understanding (CU): A specialized Azure service that takes the raw PDF and applies a domain-specific schema to extract structured fields. DSPy LLM Extractor: Uses the Markdown from Step 1 with a dynamically generated Pydantic model (built from the domain's extraction_schema.json) to extract the same fields via an LLM. The LLM provider is selectable at runtime — Azure OpenAI, Claude, or open-weight models deployed on Azure (Qwen, DeepSeek, Llama, Phi, and more from the Azure AI Model Catalog). # In the orchestrator — fire both tasks at once azure_task = context.call_activity("activity_step_03_01_azure_extraction", input) dspy_task = context.call_activity("activity_step_03_02_dspy_extraction", input) results = yield context.task_all([azure_task, dspy_task]) Both extractors use the same domain-specific schema but approach the problem differently. Running two models gives us a natural cross-check: if both extractors agree on a field value, confidence is high. If they disagree, we know exactly where to focus human attention — not the entire document, just the specific fields that need it. Multi-Provider LLM Support The DSPy extraction and classification steps aren't locked to a single model provider. From the dashboard, users can choose between: Azure OpenAI in Foundry Models — GPT-4.1, o3-mini (default) Claude on Azure — Anthropic's Claude models Foundry Models — Open-weight models deployed on Azure via Foundry Models: Qwen 2.5 72B, DeepSeek V3/R1, Llama 3.3 70B, Phi-4, and more The third option is key: instead of routing through a third-party service, you deploy open-weight models directly on Azure as serverless API endpoints through Azure AI Foundry. These endpoints expose an OpenAI-compatible API, so DSPy talks to them the same way it talks to GPT-4.1 — just with a different api_base. You get the model diversity of the open-weight ecosystem with Azure's enterprise security, compliance, and network isolation. A factory pattern in the backend resolves the selected provider and model at runtime, so switching from Azure OpenAI to Qwen on Azure AI is a single dropdown change — no config edits, no redeployment. This makes it easy to benchmark different models against the same extraction schema and compare quality. Step 4: Field-by-Field Comparison The comparator aligns the outputs of both extractors and produces a diff report: matching fields, mismatches, fields found by only one extractor, and a calculated match percentage. This feeds directly into the human review step. Output: "Match: 87.5% (14/16 fields)" Step 5: Human-in-the-Loop (HITL) Gate The pipeline pauses and waits for a human decision. The Durable Functions orchestrator uses wait_for_external_event() with a configurable timeout (default: 24 hours) implemented as a timer race: review_event = context.wait_for_external_event(HITL_REVIEW_EVENT) timeout = context.create_timer( context.current_utc_datetime + timedelta(hours=HITL_TIMEOUT_HOURS) ) winner = yield context.task_any([review_event, timeout]) The frontend shows a side-by-side comparison panel where reviewers can see both values for each disputed field — pick Azure's value, the LLM's value, or type in a correction. They can add notes explaining their decision, then approve or reject. If nobody responds within the timeout, it auto-escalates (configurable behavior). The orchestrator doesn't poll. It doesn't check a queue. The moment the reviewer submits their decision, the pipeline resumes automatically — using Durable Functions' native external event pattern. Step 6: AI Reasoning Agent The final step uses an AI agent with tool-calling to perform structured validation, consolidate field values, and generate a confidence score. This isn't just a prompt — it's an agent backed by the Microsoft Agent Framework with purpose-built tools: validate_fields — runs domain-specific validation rules (data types, ranges, cross-field logic) consolidate_extractions — merges Azure CU + DSPy outputs using confidence-weighted selection generate_summary — produces a natural-language summary with recommendations The reasoning step can use standard models or reasoning-optimised models like o3 or o3-mini for higher-stakes validation. The agent streams its reasoning process to the frontend in real time — validation results, confidence scoring, and recommendations all appear as they're generated. Domain-Driven Design: Zero-Code Extensibility One of the most powerful design choices: adding a new document type requires zero code changes. Each domain is a folder under idp_workflow/domains/ with four JSON files: idp_workflow/domains/insurance_claims/ ├── config.json # Domain metadata, thresholds, settings ├── classification_categories.json # Page-level classification taxonomy ├── extraction_schema.json # Field definitions (used by both extractors) └── validation_rules.json # Business rules for the reasoning agent The extraction_schema.json is particularly interesting — it's consumed by both the Azure CU service (which builds an analyzer from it) and the DSPy extractor (which dynamically generates a Pydantic model at runtime): def create_extraction_model_from_schema(schema: dict) -> type[BaseModel]: """Dynamically create a Pydantic model from an extraction schema JSON.""" # Maps schema field definitions → Pydantic field annotations # Supports nested objects, arrays, enums, and optional fields We currently ship four domains out of the box: insurance claims, home loans, small business lending, and trade finance. See It In Action: Processing a Home Loan Application To make this concrete, here's what happens when you process a multi-page home loan PDF — personal details, financial tables, and mixed content. Upload & Extract. The document hits the dashboard and Step 1 kicks off. Azure Document Intelligence converts all pages to structured Markdown, preserving tables and layout. You can preview the Markdown right in the detail panel. Per-Page Classification. Step 2 classifies each page independently: Page 1 is a Loan Application Form, Page 2 is Income Verification, Page 3 is a Property Valuation. Each has its own confidence score and detected fields listed. Dual Extraction. Azure CU and the DSPy LLM extractor run simultaneously. You can watch both progress bars in the dashboard. Comparison. The system finds 16 fields total. 14 match between the two extractors. Two fields differ — the annual income figure and the loan term. Those are highlighted for review. Human Review. The reviewer sees both values side by side for each disputed field, picks the correct value (or types a correction), adds a note, and approves. The moment they submit, the pipeline resumes — no polling. AI Reasoning. The agent validates against home loan business rules: loan-to-value ratio, income-to-repayment ratio, document completeness. Validation results stream in real time. Final output: 92% confidence, 11 out of 12 validations passed. The AI flags a minor discrepancy in employment dates and recommends approval with a condition to verify employment tenure. Result: A document that would take 30–45 minutes of manual processing, handled in under 2 minutes — with complete traceability. Every step, every decision, timestamped in the event log. Real-Time Frontend with SignalR Every orchestration step broadcasts events through Azure SignalR Service, targeted to the specific user who started the workflow: def _broadcast(context, user_id, event, data): return context.call_activity("notify_user", { "user_id": user_id, "instance_id": context.instance_id, "event": event, "data": data, }) The frontend generates a session-scoped userId, passes it via the x-user-id header during SignalR negotiation, and receives only its own workflow events. No Pub/Sub subscriptions to manage. The Next.js frontend uses: Zustand + Immer for state management (4 stores: workflow, events, reasoning, UI) Reaflow for the animated pipeline visualization React Query for data fetching Tailwind CSS for styling The result is a dashboard where you can upload a document and watch each pipeline step execute in real time. Infrastructure: Production-Ready from Day One The entire stack deploys with a single command using Azure Developer CLI (azd): azd up What gets provisioned: Resource Purpose Azure Functions (Flex Consumption) Backend API + orchestration Azure Static Web App Next.js frontend Durable Task Scheduler Orchestration state management Storage Account Document blob storage Application Insights Monitoring and diagnostics Network Security Perimeter Storage network lockdown Infrastructure is defined in Bicep with: Parameterized configuration (memory, max instances, retention) RBAC role assignments via a consolidated loop Two-region deployment (Functions + SWA have different region availability) Network Security Perimeter deployed in Learning mode, switched to Enforced post-deploy Key Engineering Decisions Why Durable Functions? Orchestrating a multi-step pipeline with parallel execution, external event gates, timeouts, and retry logic is exactly what Durable Functions was designed for. The orchestrator is a Python generator function — each yield is a checkpoint that survives process restarts: def idp_workflow_orchestration(context: DurableOrchestrationContext): step1 = yield from _execute_step(context, ...) # PDF extraction step2 = yield from _execute_step(context, ...) # Classification results = yield context.task_all([azure_task, dspy_task]) # Parallel extraction # ... HITL gate, reasoning agent, etc. No external queue management. No state database. No workflow engine to operate. Why Dual Extraction? Running two independent models on the same document gives us: Cross-validation — agreement between models is a strong confidence signal Coverage — one model might extract fields the other misses Auditability — human reviewers can see both outputs side by side Graceful degradation — if one service is down, the other still produces results Why DSPy over Raw Prompts? DSPy provides: Typed I/O — Pydantic models as signatures, not string parsing Composability — ChainOfThought, Predict, ReAct are interchangeable modules Prompt optimization — once you have labeled examples, DSPy can auto-tune prompts LM scoping — with dspy.context(lm=self.lm): isolates model configuration per call Getting Started # Clone git clone https://github.com/lordlinus/idp-workflow.git cd idp-workflow # DTS Emulator (requires Docker) docker run -d -p 8080:8080 -p 8082:8082 \ -e DTS_TASK_HUB_NAMES=default,idpworkflow \ mcr.microsoft.com/dts/dts-emulator:latest # Backend python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt func start # Frontend (separate terminal) cd frontend && npm install && npm run dev You'll also need Azurite (local storage emulator) running, plus Azure OpenAI, Document Intelligence, Content Understanding, and SignalR Service endpoints configured in local.settings.json. See the Local Development Guide for the full setup. Who Is This For? If any of these sound familiar, IDP Workflow was built for you: "We're drowning in documents." — High-volume document intake with manual processing bottlenecks. "We tried OCR but it breaks on new formats." — Brittle extraction that fails when layouts change. "Compliance needs an audit trail for every decision." — Regulated industries where traceability is non-negotiable. This is an AI-powered document processing platform — not a point OCR tool — with human oversight, dual AI validation, and domain extensibility built in from day one. What's Next Prompt optimization — using DSPy's BootstrapFewShot with domain-specific training examples Batch processing — fan-out/fan-in orchestration for processing document queues Custom evaluators — automated quality scoring per domain Additional domains — community-contributed domain configurations Try It Out The project is fully open source: github.com/lordlinus/idp-workflow Deploy to your own Azure subscription with azd up, upload a PDF from the sample_documents/ folder, and watch the pipeline run. We'd love feedback, contributions, and new domain configurations. Open an issue or submit a PR!
Sunil_Sattiraju
Mar 05, 2026 Place Microsoft Foundry Blog
517Views
0likes
1Comment
Unified AI Weather Forecasting Pipeline thru Aurora, Foundry, and Microsoft Planetary Computer Pro
Weather shapes some of the most critical decisions we make, from protecting our critical infrastructure and global supply chains, to keeping communities safe during extreme events. As climate variability becomes more volatile, an organization’s ability to predict, assess, and plan their response to extreme weather is a defining capability for modern infrastructure owners & operators. This is especially true for the energy and utility sector — even small delays in preparations and response can cascade into massive operational risk and financial impacts, including widespread outages and millions in recovery costs. Operators of critical power infrastructure are increasingly turning to AI-powered solutions to reduce their operational and service delivery risk. “As the physical risks to our grid systems grow, so too does our technological capacity to anticipate them. Artificial intelligence has quietly reached a maturity point in utility operations-not just as a tool for optimization, about as a strategic foresight engine. The opportunity is clear: with the right data, infrastructure, and operational alignment, AI outage prediction utility grid strategies can now forecast vulnerabilities with precision and help utilities transition from reactive to preventive risk models.” – Article by Think Power Solutions Providing direct control of their data and AI analytics allows providers to make better, more actionable insights for their operations. Today, we’ll demonstrate and explore how organizations can use the state-of-the-art Aurora weather model in Microsoft Foundry with weather data provided by Microsoft Planetary Computer (MPC), an Azure based geospatial data management platform, to develop a utility industry-specific impact prediction capability. Taking Control of your Weather Prediction Microsoft Research first announced Aurora in June 2024, a cutting-edge AI foundation model enabling locally executed, on-demand, global weather forecasting, and storm-trajectory prediction generated from publicly available weather data. Two months later, Aurora became available on Microsoft Foundry elevating on-demand weather forecasting from a self-hosted experience to managed deployments, readying Aurora for broader enterprise and public adoption. Aurora’s scientific foundations and forecasting performance were peer‑reviewed and published in Nature, providing independent validation across global benchmarks. Its evolution continues with a strong commitment to openness and interoperability: In November 2025, Microsoft announced plans to open-source Aurora to accelerate innovation across the global research and developer community. Building upon the innovation and continued development of Aurora, today we are showcasing how organizations can operationalize this state-of-the-art capability with Microsoft Planetary Computer and Microsoft Planetary Computer Pro. By bringing together the vast public geospatial data stores in Planetary Computer, with the private data managed by Planetary Computer Pro, organizations can unify their weather prediction and geospatial data in a single platform, simplifying data processing pipelines and data management. This advancement allows enterprise customers to take control of their own weather forecasting on their own timeline. A Unified Weather Prediction Data Pipeline In addition, a key pain-point for energy and utility companies is the inability to reliably ingest, store, and operationalize high-volume weather data. Model inputs and outputs often sit scattered across fragmented pipelines and platforms, making decisions difficult to trace, reproduce, and reference over time. For example, referenced in articles, many utility companies have to pull public data from various silos, maintain GIS layers in another, and run operational planning in a separate environment—forcing teams to manually stitch together forecasts, assets, and risk assessments, introducing delays exactly when rapid decisions matter most. With the MPC Pro + Microsoft Foundry pipeline, utility companies transition from fragmented, manual workflows to a single operating platform – where the value lies in a seamless end-to-end data-to-model pipeline. Users can leverage Aurora on Microsoft Foundry alongside Microsoft Planetary Computer Pro’s geospatial data platform to unlock the following unified workflow: Source near real time weather data from Planetary Computer Run Aurora in Microsoft Foundry Fuse weather prediction results with geospatial data in Planetary Computer Pro for rapid assessment and post processing A Ready-to-use reference architecture This reference architecture provides a reusable pattern for operationalizing frontier weather models with Microsoft Planetary Computer Pro and Microsoft Foundry. Our architecture feeds updated global weather data, hosted by Microsoft Planetary Computer, to the Microsoft Foundry hosted model, then fuses those prediction results with enterprise geospatial context for analysis, decision-making, and action. Each component plays a distinct role in ensuring forecasts are timely, scalable, and directly usable within operational workflows. Near Real-Time Weather Data Microsoft Planetary Computer automatically ingests, indexes, and distributes up-to-date global weather data from the European Centre for Medium-Range Weather Forecasts (ECMWF) four times per day. This fully managed data pipeline ensures that the latest atmospheric datasets are continuously refreshed, standardized, and readily accessible, eliminating the need for manual data acquisition or preprocessing. Storing and Centralizing Public and Private Geospatial Data on Microsoft Planetary Computer Pro Microsoft Planetary Computer Pro enables utility operators to store, manage, and access both public and private geospatial datasets within a single Azure platform. With a Microsoft Planetary Computer Pro GeoCatalog, organizations can centralize ECMWF weather data alongside infrastructure and location data to support downstream analyses. Microsoft Foundry Hosts and Runs Weather Prediction Model on Demand Microsoft Foundry provides model access and the infrastructure required to support execution of Aurora and other weather forecasting models. Users can provision Aurora inference endpoints on their own dedicated compute. After provisioned, the user would be able to open the python notebook and run the model to execute weather forecasts on demand. Weather Forecast Outputs are Fused with Existing Data Sources on Microsoft Planetary Computer Pro Aurora’s weather prediction outputs are seamlessly integrated back into Microsoft Planetary Computer Pro, where they are fused with existing public or private geospatial datasets. This makes forecast results immediately accessible for visualization, post-processing, and analysis—such as identifying assets at risk, estimating localized impact, informing operational response plans, or pre-positioning needed assets for quick recovery. By combining AI-driven forecasts with geospatial context, organizations can move from raw predictions to actionable insights in a single workflow. This solution also provides organizations with a centralized platform to store and catalog geospatial data for future traceability. Unified Weather Prediction Demonstration This demonstration visualizes the forecast storm track (Figure 2), along with projected damage impact along the storm path and associated coastal surge areas (Figure 3 & 4). This enables users to assess asset exposure, anticipate damage due to winds, pre-position crews, and proactively protect critical infrastructure—helping reduce outage duration, lower operational costs, and improve grid resilience. & Powerplants) Getting Started The python notebook supports tracking of historical storm events, forecasting real-time storm trajectories, and overlaying critical power infrastructure structure data from OpenStreetMap to visualize overlap. To get started, deploy this solution in your Azure environment to begin generating weather forecasts and storm-track predictions. The code and documentation for running this notebook are available in the linked GitHub Repo. Sample output for you to explore are linked within this HTML. For additional resources, visit the following MS Learn pages: Microsoft Planetary Computer Pro Microsoft Foundry The interoperability between ‘GeoAI models + data platform’ extends far beyond weather prediction. It empowers organizations to take control of their geospatial data; to generate actionable insights on their own timeline, and to meet their own specific needs. With Microsoft Planetary Computer and Microsoft Foundry together, organizations will unify their enterprise geospatial data, and unlock its value with powerful, and state of the art AI solutions.
Yves-Pitsch
Mar 05, 2026 Place Microsoft Foundry Blog
790Views
4likes
0Comments
Introducing GPT-5.3 Chat in Microsoft Foundry: A more grounded way to chat at enterprise scale
OpenAI’s GPT‑5.3 Chat marks the next step in the GPT‑5 series, designed to deliver more dependable, context‑aware chat experiences for enterprise workloads. The model emphasizes steadier instruction handling and clearer responses, supporting high‑volume, real‑world conversations with greater consistency. GPT‑5.3 Chat is now available via API in Microsoft Foundry, where teams will be able to deploy production‑ready chat and agent experiences that are standardized, governed, and built to scale across the enterprise. What’s new in GPT‑5.3 Chat GPT‑5.3 Chat centers on predictable behavior, relevance, and response quality, helping teams build chat experiences that operate reliably across end‑to‑end workflows while aligning with enterprise safety and compliance expectations. Fewer dead ends, more resolved conversations Reduces unnecessary refusals by responding more proportionately when safe context is available Supports compliant reformulation to keep interactions moving forward Enables end‑to‑end resolution in support, IT, and policy‑driven workflows Grounded answers you can operationalize Combines built‑in web search with model reasoning to surface relevant, actionable information Prioritizes relevance and context over long lists of loosely related results Keeps responses actionable while maintaining enterprise controls and traceability Consistent outputs at scale Improved tone, explanation quality, and instruction following Easier to template, govern, and monitor across apps Less downstream cleanup as usage scales Built for production in Microsoft Foundry Production‑grade infrastructure Observability, failover, quota management, and performance monitoring Designed for real workloads—not experiments Consistent behavior across regions and use cases without re‑architecting Smarter scaling with quota tiers Automatic quota increases with sustained usage Fewer rate‑limit interruptions as demand grows Flexible tiers from Free through Tier 6 Security and compliance by default Identity, access controls, policy enforcement, and data boundaries built in Meets regulated‑industry requirements out of the box Teams can move fast without compromising trust GPT-5.3 Chat in Microsoft Foundry is priced at $1.75 per million input tokens, $0.175 per million cached input tokens, and $14.00 per million output tokens. Ready to build with GPT‑5.3 Chat in Foundry? Start turning reliable conversations into real applications. Explore GPT-5.3 Chat in Microsoft Foundry and begin building production ready‑ chat and agent experiences today.
Naomi Moneypenny
Mar 04, 2026 Place Microsoft Foundry Blog
4.9KViews
1like
1Comment
Introducing Phi-4-Reasoning-Vision to Microsoft Foundry
Vision reasoning models unlock a critical capability for developers: the ability to move beyond passive perception toward systems that can understand, reason over, and act on visual information. Instead of treating images, diagrams, documents, or UI screens as unstructured inputs, vision reasoning models enable developers to build applications that can interpret visual structure, connect it with textual context, and perform multi-step reasoning to reach actionable conclusions. Today, we are excited to announce Phi-4-Reasoning-Vision-15B is available in Microsoft Foundry and Hugging Face. This model brings high‑fidelity vision to the reasoning‑focused Phi‑4 family, extending small language models (SLMs) beyond perception into structured, multi‑step visual reasoning for agents, analytical tools, and scientific workflows. What’s new? The Phi model family has advanced toward combining efficient visual understanding with strong reasoning in small language models. Earlier Phi‑4 models demonstrated reliable perception and grounding across images and text, while later iterations introduced structured reasoning to improve performance on complex tasks. Phi‑4‑reasoning-vision-15B brings these threads together, pairing high‑resolution visual perception with selective, task‑aware reasoning. As a result, the model can reason deeply when needed while remaining fast and efficient for perception‑focused scenarios—making it well suited for interactive, real‑world applications. Key capabilities Reasoning behavior is explicitly enabled via prompting: Developers can explicitly enable or disable reasoning to balance latency and accuracy at runtime. Optimized for vision reasoning and can be used for: diagram-based math, document, chart, and table understanding, GUI interpretations and grounding for agent scenarios to interpret screens and actions, Computer-use agent scenarios, and General image chat and answering questions Benchmarks The following results summarize Phi-4-reasoning-vision-15B performance across a set of established multimodal reasoning, mathematics, and computer use benchmarks. The following benchmarks are the result of internal evaluations. Benchmark Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B – force no think Phi-4-mm-instruct Kimi-VL-A3B-Instruct gemma-3-12b-it Qwen3-VL-8B-Instruct-4K Qwen3-VL-8B-Instruct-32K Qwen3-VL-32B-Instruct-4K Qwen3-VL-32B-Instruct-32K AI2D _TEST 84.8 84.7 68.6 84.6 80.4 82.7 83 84.8 85 ChartQA _TEST 83.3 76.5 23.5 87 39 83.1 83.2 84.3 84 HallusionBench 64.4 63.1 56 65.2 65.3 73.5 74.1 74.4 74.9 MathVerse _MINI 44.9 43.8 32.4 41.7 29.8 54.5 57.4 64.2 64.2 MathVision _MINI 36.2 34.2 20 28.3 31.9 45.7 50 54.3 60.5 MathVista _MINI 75.2 68.7 50.5 67.1 57.4 77.1 76.4 82.5 81.8 MMMU _VAL 54.3 52 42.3 52 50 60.7 64.6 68.6 70.6 MMStar 64.5 63.3 45.9 60 59.4 68.9 69.9 73.7 74.3 OCRBench 76 75.6 62.6 86.5 75.3 89.2 90 88.5 88.5 ScreenSpot _v2 88.2 88.3 28.5 89.8 3.5 91.5 91.5 93.7 93.9 Table 1: Accuracy comparisons relative to popular open-weight, non-thinking models Benchmark Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B - force thinking Kimi-VL-A3B-Thinking gemma-3-12b-it Qwen3-VL-8B-Thinking-4K Qwen3-VL-8B-Thinking-40K Qwen3-VL-32B-Thiking-4K Qwen3-VL-32B-Thinking-40K AI2D_TEST 84.8 79.7 81.2 80.4 83.5 83.9 86.9 87.2 ChartQA _TEST 83.3 82.9 73.3 39 78 78.6 78.5 79.1 HallusionBench 64.4 63.9 70.6 65.3 71.6 73 76.4 76.6 MathVerse _MINI 44.9 53.1 61 29.8 67.3 73.3 78.3 78.2 MathVision _MINI 36.2 36.2 50.3 31.9 43.1 50.7 60.9 58.6 MathVista _MINI 75.2 74.1 78.6 57.4 77.7 79.5 83.9 83.8 MMMU _VAL 54.3 55 60.2 50 59.3 65.3 72 72.2 MMStar 64.5 63.9 69.6 59.4 69.3 72.3 75.5 75.7 OCRBench 76 73.7 79.9 75.3 81.2 82 83.7 85 ScreenSpot _v2 88.2 88.1 81.8 3.5 93.3 92.7 83.1 83.1 Table 2: Accuracy comparisons relative to popular open-weight, thinking models All results were obtained using a consistent evaluation setup and prompts across models; numbers are provided for comparison and analysis rather than as leaderboard claims. For more information regarding benchmarks and evaluations, please read the technical paper on the Microsoft Research hub. Suggested use cases and applications Phi‑4‑Reasoning-Vision-15B supports applications that require both high‑fidelity visual perception and structured inference. Two representative scenarios include scientific and mathematical reasoning over visual inputs, and computer‑using agents (CUAs) that operate directly on graphical user interfaces. In both cases, the model provides grounded visual understanding paired with controllable, low‑latency reasoning suitable for interactive systems. Computer use agents in retail scenarios For computer use agents, Phi‑4‑Reasoning-Vision-15B provides the perception and grounding layer required to understand and act within live ecommerce interfaces. For example, in an online shopping experience, the model interprets screen content—products, prices, filters, promotions, buttons, and cart state—and produces grounded observations that agentic models like Fara-7B can use to select actions. Its compact size and low latency inference make it well suited for CUA workflows and agentic applications. Visual reasoning for education Another practical use of visual reasoning models is education. A developer could build a K‑12 tutoring app with Phi‑4‑Reasoning‑Vision‑15B where students upload photos of worksheets, charts, or diagrams to get guided help—not answers. The model can understand the visual content, identify where the student went wrong, and explain the correct steps clearly. Over time, the app can adapt by serving new examples matched to the student’s learning level, turning visual problem‑solving into a personalized learning experience. Microsoft Responsible AI principles At Microsoft, our mission to empower people and organizations remains constant—especially in the age of AI, where the potential for human achievement is greater than ever. We recognize that trust is foundational to AI adoption, and earning that trust requires a commitment to transparency, safety, and accountability. As with other Phi models, Phi-4-Reasoning-Vision-15B was developed with safety as a core consideration throughout training and evaluation. The model was trained on a mixture of public safety datasets and internally generated examples designed to elicit behaviors the model should appropriately refuse, in alignment with Microsoft’s Responsible AI Principles. These safety focused training signals help the model recognize and decline requests that fall outside intended or acceptable use. Additional details on the model’s safety considerations, evaluation approach, and known limitations are provided in the accompanying technical blog and model card. Getting started Start using Phi‑4‑Reasoning-Vision-15B in Microsoft Foundry today. Microsoft Foundry provides a unified environment for model discovery, evaluation, and deployment, making it straightforward to move from initial experimentation to production use while applying appropriate safety and governance practices. Deploy the new model on Microsoft Foundry. Learn more about the Phi family on Foundry Labs and in the Phi Cookbook Connect to the Microsoft Developer Community on Discord Read the technical paper on Microsoft Research Read more use cases on the Educators Developer blog
yashlara
Mar 04, 2026 Place Microsoft Foundry Blog
960Views
0likes
0Comments
Grok 4.0 Goes GA in Microsoft Foundry and Grok 4.1 Fast Arrives with Major Enhancements
We first brought Grok 4.0 to Microsoft Foundry in September 2025, marking an important milestone in expanding Foundry’s multi-model ecosystem with frontier reasoning models from xAI. Since then, customer interest and usage have continued to build as developers explored Grok’s strengths in fast reasoning, sense-making, and interpretation of complex, ambiguous information. Today, we’re excited to announce that Grok 4.0 is now generally available (GA) in Microsoft Foundry, giving enterprises a production-ready path to deploy Grok at scale. Building on that momentum, Grok 4.1 Fast (Reasoning and Non-reasoning) are now available in Microsoft Foundry. Grok 4.1 introduces a suite of improvements that enhance conversation quality, creativity, and emotional intelligence while maintaining core reasoning strengths. According to xAI, Grok 4.1 delivers more natural, fluid dialogue compared with earlier versions. Introducing Grok 4.1 Fast (Reasoning and Non-Reasoning) Grok 4.1 Fast is optimized for speed, scale, and agentic execution, giving developers flexibility to choose between reasoning and non-reasoning variants depending on workload requirements. Grok 4.1 Fast (Reasoning): Designed for scenarios that require rapid multi-step reasoning, structured decision-making, and interpretation of complex inputs. This variant is well-suited for agent workflows, analysis pipelines, and applications that need fast responses without sacrificing reasoning depth. Grok 4.1 Fast (Non-Reasoning): Optimized for maximum throughput and low latency, this variant is ideal for tasks such as summarization, classification, content transformation, and tool-driven execution where deterministic speed and efficiency matter more than deep reasoning. Together, these options allow teams to right-size performance and cost by selecting the appropriate Grok 4.1 Fast variant for each stage of an application from high-volume preprocessing and orchestration to targeted reasoning tasks. What’s New with Grok 4.1 Fast? Grok 4.1 brings several enhancements that broaden the model’s applicability and user experience: Improved Conversational Quality: According to xAI, Grok 4.1 Fast offers smoother, more natural interaction patterns, making it more comfortable and intuitive to engage with, especially in multi-turn dialogues. Enhanced Creativity and Emotional Awareness: According to xAI, Grok 4.1 Fast demonstrates stronger creative writing capabilities and greater emotional intelligence, helping it generate more expressive and engaging outputs that better align with human expectations. Reduced Hallucination and Better Reliability: According to xAI, Grok 4.1 Fast produces fewer factual inaccuracies than its predecessor These enhancements make Grok 4.1 Fast a compelling choice for use cases that require engaging conversational interfaces, creative support, and rich natural language interaction. As with all frontier AI models, Grok-4.1 Fast introduces new capabilities alongside new operational considerations. Microsoft’s safety and responsible AI evaluations indicate that Grok-4.1 Fast may demonstrate increased risks in safety testing compared with other models available through Azure. In practice, this means there may be an increased risk of generating explicit or potentially harmful content. To support responsible deployment, customers should implement system-level safety instructions and leverage Azure AI Content Safety (AACS) to help monitor and filter outputs. Because no single safety system can address every possible risk scenario, customers are encouraged to conduct their own evaluations and validation before deploying Grok-4.1 in production systems. To provide enhanced safety and enterprise reliability, Microsoft's deployment of Grok 4.1 features a system-applied safety prompt that cannot be disabled. Customers are expected to operate the model without attempting to bypass or interfere with this feature. Enterprise-Ready Deployment via Microsoft Foundry With Grok 4.0 now GA in Foundry, enterprises gain the ability to incorporate advanced reasoning models into their workflows while enjoying the governance, compliance, and operational tooling that Azure provides. Models hosted in Foundry can be deployed serverless or with provisioned throughput, and customers benefit from centralized billing, identity integration, and access to other Azure services. Foundry’s model catalog also includes other Grok variants such as Grok 4.1 Fast and related non-reasoning SKUs, giving enterprises flexibility to balance performance, latency, and cost depending on their workloads. Pricing Model Deployment Input/1M Tokens Output/1M Tokens Availability Grok 4.1 Fast (Non-Reasoning) Global Standard $0.2 $0.5 Public Preview on 2/27/2026 Grok 4.1 Fast (Reasoning) Global Standard $0.2 $0.5 Public Preview on 3/4/2026 Looking Ahead The combination of Grok’s deep reasoning capabilities with the enterprise readiness of Microsoft Foundry opens new possibilities for production AI applications, from complex analytical agents and research assistants to creative and customer-facing experiences. With Grok 4.1’s conversational refinements further raising the model’s usability and expressiveness, Foundry customers can now experiment with and scale a broader set of AI-driven solutions, all within a trusted, governed environment. As Microsoft continues to expand Foundry’s catalog and partners like xAI continue to innovate, organizations have more options than ever to power next-generation AI applications across industries, use cases, and domains. Try Grok 4.1 Non-Reasoning <AI Model Catalog | Microsoft Foundry Models> Reasoning <AI Model Catalog | Microsoft Foundry Models>
Naomi Moneypenny
Mar 04, 2026 Place Microsoft Foundry Blog
810Views
0likes
0Comments
Announcing GPT‑5.2‑Codex in Microsoft Foundry: Enterprise‑Grade AI for Secure Software Engineering
Enterprise developers know the grind: wrestling with legacy code, navigating complex dependency challenges, and waiting on security reviews that stall releases. OpenAI’s GPT‑5.2‑Codex flips that equation and helps engineers ship faster without cutting corners. It’s not just autocomplete; it’s a reasoning engine for real-world software engineering. Generally available starting today through Azure OpenAI in Microsoft Foundry Models, GPT‑5.2‑Codex is built for the realities of enterprise codebases, large repos, evolving requirements, and security constraints that can’t be overlooked. As OpenAI’s most advanced agentic coding model, it brings sustained reasoning, and security-aware assistance directly into the workflows enterprise developers already rely on with Microsoft’s secure and reliable infrastructure. GPT-5.2-Codex at a Glance GPT‑5.2‑Codex is designed for how software gets built in enterprise teams. You start with imperfect inputs including legacy code, partial docs, screenshots, diagrams, and work through multi‑step changes, reviews, and fixes. The model helps keep context, intent, and standards intact across that entire lifecycle, so teams can move faster without sacrificing quality or security. What it enables Work across code and artifacts: Reason over source code alongside screenshots, architecture diagrams, and UI mocks — so implementation stays aligned with design intent. Stay productive in long‑running tasks: Maintain context across migrations, refactors, and investigations, even as requirements evolve. Build and review with security in mind: Get practical support for secure coding patterns, remediation, reviews, and vulnerability analysis — where correctness matters as much as speed. Feature Specs (quick reference) Context window: 400K tokens (approximately 100K lines of code) Supported languages: 50+ including Python, JavaScript/TypeScript, C#, Java, Go, Rust Multimodal inputs: Code, images (UI mocks, diagrams), and natural language API compatibility: Drop-in replacement for existing Codex API calls Use cases where it really pops Legacy modernization with guardrails: Safely migrate and refactor “untouchable” systems by preserving behavior, improving structure, and minimizing regression risk. Large‑scale refactors that don’t lose intent: Execute cross‑module updates and consistency improvements without the typical “one step forward, two steps back” churn. AI‑assisted code review that raises the floor: Catch risky patterns, propose safer alternatives, and improve consistency, especially across large teams and long‑lived codebases. Defensive security workflows at scale: Accelerate vulnerability triage, dependency/path analysis, and remediation when speed matters, but precision matters more. Lower cognitive load in long, multi‑step builds: Keep momentum across multi‑hour sessions: planning, implementing, validating, and iterating with context intact. Pricing Model Input Price/1M Tokens Cached Input Price/1M Tokens Output Price/1M Tokens GPT-5.2-Codex $1.75 $0.175 $14.00 Security Aware by Design, not as an Afterthought For many organizations, AI adoption hinges on one nonnegotiable question: Can this be trusted in security sensitive workflows? GPT-5.2-Codex meaningfully advances the Codex lineage in this area. As models grow more capable, we’ve seen that general reasoning improvements naturally translate into stronger performance in specialized domains — including defensive cybersecurity. With GPT‑5.2‑Codex, this shows up in practical ways: Improved ability to analyze unfamiliar code paths and dependencies Stronger assistance with secure coding patterns and remediation More dependable support during code reviews, vulnerability investigations, and incident response At the same time, Microsoft continues to deploy these capabilities thoughtfully balancing access, safeguards, and platform level controls so enterprises can adopt AI responsibly as capabilities evolve. Why Run GPT-5.2-Codex on Microsoft Foundry? Powerful models matter — but where and how they run matters just as much for enterprise. Organizations choose Microsoft Foundry because it combines Foundry frontier AI with Azure enterprise grade fundamentals: Integrated security, compliance, and governance Deploy GPT-5.2-Codex within existing Azure security boundaries, identity systems, and compliance frameworks — without reinventing controls. Enterprise ready orchestration and tooling Build, evaluate, monitor, and scale AI powered developer experiences using the same platform teams already rely on for production workloads. A unified path from experimentation to scale Foundry makes it easier to move from proof of concept to real deployment —without changing platforms, vendors, or operating assumptions. Trust at the platform level For teams working in regulated or security critical environments, Foundry and Azure provide assurances that go beyond the model itself. Together with GitHub Copilot, Microsoft Foundry provides a unified developer experience — from in‑IDE assistance to production‑grade AI workflows — backed by Azure’s security, compliance, and global scale. This is where GPT-5.2-Codex becomes not just impressive but adoptable. Get Started Today Explore GPT‑5.2‑Codex in Microsoft today. Start where you already work: Try GPT‑5.2‑Codex in GitHub Copilot for everyday coding and scale the same model to larger workflows using Azure OpenAI in Microsoft Foundry. Let’s build what’s next with speed and security.
Naomi Moneypenny
Mar 04, 2026 Place Microsoft Foundry Blog
16KViews
3likes
1Comment
Unlocking Document Understanding with Mistral Document AI in Microsoft Foundry
Enterprises today face a familiar yet formidable challenge: mountains of documents -contracts, invoices, reports, forms - remain locked in unstructured formats. Traditional OCR (optical character recognition) captures text, but often struggles with context, layout complexity, or multilingual content. The result? Slow workflows, error-prone manual reviews, and missed insights. Enter mistral-document-ai-2512 in Microsoft Foundry. This new model brings together high-end OCR using mistral-ocr-2512 and intelligent document understanding using mistral-small-2506 to turn unstructured documents into actionable data. It doesn’t just “read” pages - it understands them: multi-column layouts, handwritten annotations, tables with merging cells, multilingual content-all processed with enterprise-grade speed and precision. In this blog, we’ll explore what Mistral Document AI 2512 is, why it matters, how it stacks up, and the business impact it promises, especially when paired with solution accelerators like ARGUS. Meet Mistral Document AI Mistral Document AI is an enterprise-grade document understanding model, offered via Microsoft Foundry. It’s built to convert both physical (scans, photos) and digital (PDFs, DOCX) documents into highly structured, machine-readable outputs. Key features include: Top-tier accuracy: According to benchmarks, Mistral’s OCR 2512 stacks display significantly higher accuracy than many alternatives, especially on scanned documents and complex layouts. For example, in comparisons it achieved ~95.9 % “overall” vs ~89-91 % for other platforms Global / multilingual reach: In language-by-language tests (Russian, French, German, Spanish, Chinese, etc), Mistral’s error-rate/fuzzy-match metrics reached 99 %+ in many cases Layout & context awareness: It’s built to not just extract linear text but understand multi-column layouts, tables, charts, images, handwritten input and more Structured output functionality: The model supports structured extraction (JSON), markup (Markdown with interleaved images), preserving document structure for downstream systems Enterprise-ready deployment: With availability via Microsoft Foundry and support for private/secure inference, the model is geared for regulated industries and high-volume workflows Putting it another way: where traditional OCR stops at “here’s the raw text on page 7”, Mistral DocumentAI 2512 can say “here’s the vendor invoice, here are line-items, here’s the total, here’s the signature block, and here’s the part that was handwritten”, ready to plug into downstream systems. Business Impact & Industry examples Mistral Document AI isn’t just another OCR tool; it’s a strategic enabler that turns document-heavy operations into intelligent, automated workflows. The business value comes down to four key advantages: Speed and efficiency: Automating document understanding eliminates manual reviews and retyping. Tasks that took days can be done in minutes, accelerating core business processes Accuracy and consistency: With 99 %+ recognition accuracy and deep layout understanding, Mistral delivers cleaner data and fewer downstream errors - essential in compliance-critical or analytics-driven operations Cost and productivity gains: Reducing manual extraction frees teams for higher-value work, cutting operational costs while increasing output per employee Scalability and adaptability: Cloud-native performance allows organizations to scale document processing instantly during peak loads, across multiple languages and formats, without sacrificing quality Overall, mistral-document-ai-2512 excels where consistency and quality are critical. Industry and Use Cases In regulated industries or big-data scenarios, even a small improvement in accuracy or speed can translate into substantial business gains. Its benchmarks indicate not just incremental progress, but a major step forward - giving enterprises a powerful new engine for their document workflows. Here’s where that impact becomes tangible: Financial services: Banks and insurers handle vast document volumes - loan applications, KYC forms, and claims reports - where data integrity and auditability are non-negotiable. Mistral automates extraction, classification, and clause identification across diverse formats, improving turnaround time and compliance accuracy while reducing manual handling costs Healthcare & life sciences: Clinical records, lab results, and insurance claims often combine handwritten, tabular, and multi-language content. Mistral’s layout awareness and multilingual support ensure clean, structured datasets for downstream analytics and regulatory submissions Manufacturing & logistics: From quality certificates to shipping manifests, Mistral streamlines the flow of operational documents. It can extract production parameters, vendor data, and timestamps at scale - building a unified, queryable data layer that supports supply chain traceability Legal & public sector: Legal teams and agencies depend on consistency and transparency. Mistral helps index, summarise, and validate contracts or permits with full structural fidelity - dramatically cutting review cycles while maintaining evidential quality Retail & consumer goods: Retailers process supplier invoices, product specifications, and marketing briefs from global partners. With Mistral’s multilingual precision and structure preservation, global document flows become searchable and analytics-ready Across these industries, the result is the same: cleaner data, faster throughput, and fewer human errors - the foundation for more reliable decisions and more agile operations. Pricing Argus – A ready-to-implement accelerator to start using Mistral Document AI To spin up a solution faster, one can leverage solution accelerators such as ARGUS (open-source repository available on GitHub). ARGUS serves as a full-pipeline implementation: from document ingestion, OCR/extraction (via Mistral Document AI), to downstream processing and structured output. It shows how to deploy end-to-end, integrate with storage, preprocess documents, handle large-scale batches, output JSON schemas, and integrate into existing business workflows. Mistral Document AI Integration ARGUS now offers flexible OCR provider selection with Mistral Document AI as one of the several options. This enhancement gives you the freedom to choose the best OCR engine for your specific document processing needs. Key Features: Dual Provider Support: Toggle between Azure Document Intelligence (default) and Mistral Document AI Runtime Switching: Change OCR providers on-the-fly through the Settings UI without redeployment Simple Configuration: Set up Mistral via environment variables (OCR_PROVIDER, MISTRAL_DOC_AI_ENDPOINT, MISTRAL_DOC_AI_KEY) or the web interface Seamless Integration: Both providers expose the same interface, ensuring consistent behavior across your document processing pipeline Why This Matters: Different OCR engines excel at processing different document content. Azure Document Intelligence offers enterprise-grade form and table recognition, while Mistral Document AI 2512, in addition, enables extraction to structured JSON with customizable schemas, document classification, and image processing—including text, charts, and signatures. It can convert charts into tables, extract fine print from figures, and even define custom image types for specialized workflows. Now you can select the optimal provider for each use case. In effect, instead of building from scratch, ARGUS gives you the legs to run: pipeline orchestration, ingestion, error-handling, schema-mapping, output integration-all wired to Mistral’s engine. This significantly accelerates time-to-value and reduces risk for enterprise adopters. Getting Started: Navigate to the ARGUS frontend interface (Streamlit app) and click on the Settings tab. In the OCR Provider Configuration section, select your preferred provider. If using Mistral, enter your endpoint URL, API key, and model name. Click Update OCR Provider to apply changes immediately—no restart required. All new document processing will use your selected OCR engine. If your organization is looking to unlock document intelligence, here’s a structured path: Explore Mistral Document AI via Microsoft Foundry: Browse the model card, review endpoint specs, try sample documents to test accuracy and extraction structure Deploy and Pilot with ARGUS: Use the GitHub repo to spin up an end-to-end pipeline on a small workload (e.g., a batch of invoices or contracts) and compare manual vs AI-driven throughput and error-rates Define business value metrics: Track processing time, error rate, manual hours saved, and downstream impact (faster decision cycles, fewer reworks). Scale and govern: Once pilot proves value, expand into multiple document types, languages, geographies - and ensure governance (data handling, compliance, model-monitoring) Embed continuous improvement: As usage grows, feed back learnings, tune schema definitions, refine extraction rules, and extend into QA, insights or analytics layers Conclusion In today’s data-rich but document-heavy environment, the ability to truly understand documents (and not just digitize them) is becoming a strategic imperative. Mistral Document AI represents a next-generation shift: accurate, layout-aware, multilingual, structured. When paired with accelerators like ARGUS, enterprises can move from manual bottlenecks to streamlined, insight-rich document workflows. If you’re thinking about unlocking the value buried in your documents-be it invoices, contracts, forms or reports, now is the time. With mistral-document-ai-2512, what used to be a cost-center is now a potential performance lever. Ready to get started? Explore the model, and let your documents begin talking back.
Naomi Moneypenny
Mar 02, 2026 Place Microsoft Foundry Blog
4.9KViews
2likes
0Comments