artificial intelligence
270 TopicsIntroducing GPT-5.4 in Microsoft Foundry
Today, we’re thrilled to announce that OpenAI’s GPT‑5.4 is now generally available in Microsoft Foundry: a model designed to help organizations move from planning work to reliably completing it in production environments. As AI agents are applied to longer, more complex workflows; consistency and follow‑through become as important as raw intelligence. GPT‑5.4 combines stronger reasoning with built in computer use capabilities to support automation scenarios, and dependable execution across tools, files, and multi‑step workflows at scale. GPT-5.4: Enhanced Reliability in Production AI GPT-5.4 is designed for organizations operating AI in real production environments, where consistency, instruction adherence, and sustained context are critical to success. The model brings together advances in reasoning, coding, and agentic workflows to help AI systems not only plan tasks but complete them with fewer interruptions and reduced manual oversight. Compared with earlier generations, GPT-5.4 emphasizes stability across longer interactions, enabling teams to deploy agentic AI with greater confidence in day-to-day production use. GPT-5.4 introduces advancements that aim for production grade AI: More consistent reasoning over time, helping maintain intent across multi‑turn and multi‑step interactions Enhanced instruction alignment to reduce prompt tuning and oversight Latency improved performance for responsive, real-time workflows Integrated computer use capabilities for structured orchestration of tools, file access, data extraction, guarded code execution, and agent handoffs More dependable tool invocation reducing prompt tuning and human oversight Higher‑quality generated artifacts, including documents, spreadsheets, and presentations with more consistent structure Together, these improvements support AI systems that behave more predictably as tasks grow in length and complexity. From capability to real-world outcomes GPT‑5.4 delivers practical value across a wide range of production scenarios where follow‑through and reliability are essential: Agent‑driven workflows, such as customer support, research assistance, and business process automation Enterprise knowledge work, including document drafting, data analysis, and presentation‑ready outputs Developer workflows, spanning code generation, refactoring, debugging support, and UI scaffolding Extended reasoning tasks, where logical consistency must be preserved across longer interactions Teams benefit from reduced task drift, fewer mid‑workflow failures, and more predictable outcomes when deploying GPT‑5.4 in production. GPT-5.4 Pro: Deeper analysis for complex decision workflows GPT‑5.4 Pro, a premium variant designed for scenarios where analytical depth and completeness are prioritized over latency. Additional capabilities include: Multi‑path reasoning evaluation, allowing alternative approaches to be explored before selecting a final response Greater analytical depth, supporting problems with trade‑offs or multiple valid solutions Improved stability across long reasoning chains, especially in sustained analytical tasks Enhanced decision support, where rigor and thoroughness outweigh speed considerations Organizations typically select GPT‑5.4 Pro when deeper analysis is required such as scientific research and complex problems, while GPT‑5.4 remains the right choice for workloads that prioritize reliable execution and agentic follow‑through. Microsoft Foundry: Enterprise‑Grade Control from Day One GPT‑5.4 and GPT‑5.4 Pro are available through Microsoft Foundry, which provides the operational controls organizations need to deploy AI responsibly in production environments. Foundry supports policy enforcement, monitoring, version management, and auditability, helping teams manage AI systems throughout their lifecycle. By deploying GPT‑5.4 through Microsoft Foundry, organizations can integrate advanced agentic capabilities into existing environments while aligning with security, compliance, and operational requirements from day one. Customer Spotlight Get Started with GPT-5.4 in Microsoft Foundry GPT‑5.4 sets a new bar for production‑ready AI by combining stronger reasoning with dependable execution. Through enterprise‑grade deployment in Microsoft Foundry, organizations can move beyond experimentation and confidently build AI systems that complete complex work at scale. Computer use capabilities will be introduced shortly after launch. GPT‑5.4 <272K input tokens context length in Microsoft Foundry is priced at $2.50 per million input tokens, $0.25 per million cached input tokens, and $15.00 per million output tokens. The GPT‑5.4 >272K input tokens context length in Microsoft Foundry is priced at $5.00 per million input tokens, $0.50 per million cached input tokens, and $22.50 per million output tokens. The GPT-5.4 is available at launch in Standard Global and Standard Data Zone (US), with additional deployment options coming soon. GPT‑5.4 Pro is priced at $30.00 per million input tokens, and $180.00 per million output tokens, and is available at launch in Standard Global. Build agents for real-world workloads. Start building with GPT‑5.4 in Microsoft Foundry today.14KViews3likes2CommentsHow Azure NetApp Files Object REST API powers Azure and ISV Data and AI services – on YOUR data
This article introduces the Azure NetApp Files Object REST API, a transformative solution for enterprises seeking seamless, real-time integration between their data and Azure's advanced analytics and AI services. By enabling direct, secure access to enterprise data—without costly transfers or duplication—the Object REST API accelerates innovation, streamlines workflows, and enhances operational efficiency. With S3-compatible object storage support, it empowers organizations to make faster, data-driven decisions while maintaining compliance and data security. Discover how this new capability unlocks business potential and drives a new era of productivity in the cloud.1.1KViews0likes0CommentsIntroducing GPT-5.3 Chat in Microsoft Foundry: A more grounded way to chat at enterprise scale
OpenAI’s GPT‑5.3 Chat marks the next step in the GPT‑5 series, designed to deliver more dependable, context‑aware chat experiences for enterprise workloads. The model emphasizes steadier instruction handling and clearer responses, supporting high‑volume, real‑world conversations with greater consistency. GPT‑5.3 Chat is now available via API in Microsoft Foundry, where teams will be able to deploy production‑ready chat and agent experiences that are standardized, governed, and built to scale across the enterprise. What’s new in GPT‑5.3 Chat GPT‑5.3 Chat centers on predictable behavior, relevance, and response quality, helping teams build chat experiences that operate reliably across end‑to‑end workflows while aligning with enterprise safety and compliance expectations. Fewer dead ends, more resolved conversations Reduces unnecessary refusals by responding more proportionately when safe context is available Supports compliant reformulation to keep interactions moving forward Enables end‑to‑end resolution in support, IT, and policy‑driven workflows Grounded answers you can operationalize Combines built‑in web search with model reasoning to surface relevant, actionable information Prioritizes relevance and context over long lists of loosely related results Keeps responses actionable while maintaining enterprise controls and traceability Consistent outputs at scale Improved tone, explanation quality, and instruction following Easier to template, govern, and monitor across apps Less downstream cleanup as usage scales Built for production in Microsoft Foundry Production‑grade infrastructure Observability, failover, quota management, and performance monitoring Designed for real workloads—not experiments Consistent behavior across regions and use cases without re‑architecting Smarter scaling with quota tiers Automatic quota increases with sustained usage Fewer rate‑limit interruptions as demand grows Flexible tiers from Free through Tier 6 Security and compliance by default Identity, access controls, policy enforcement, and data boundaries built in Meets regulated‑industry requirements out of the box Teams can move fast without compromising trust GPT-5.3 Chat in Microsoft Foundry is priced at $1.75 per million input tokens, $0.175 per million cached input tokens, and $14.00 per million output tokens. Ready to build with GPT‑5.3 Chat in Foundry? Start turning reliable conversations into real applications. Explore GPT-5.3 Chat in Microsoft Foundry and begin building production ready‑ chat and agent experiences today.4.9KViews1like1CommentIntroducing Phi-4-Reasoning-Vision to Microsoft Foundry
Vision reasoning models unlock a critical capability for developers: the ability to move beyond passive perception toward systems that can understand, reason over, and act on visual information. Instead of treating images, diagrams, documents, or UI screens as unstructured inputs, vision reasoning models enable developers to build applications that can interpret visual structure, connect it with textual context, and perform multi-step reasoning to reach actionable conclusions. Today, we are excited to announce Phi-4-Reasoning-Vision-15B is available in Microsoft Foundry and Hugging Face. This model brings high‑fidelity vision to the reasoning‑focused Phi‑4 family, extending small language models (SLMs) beyond perception into structured, multi‑step visual reasoning for agents, analytical tools, and scientific workflows. What’s new? The Phi model family has advanced toward combining efficient visual understanding with strong reasoning in small language models. Earlier Phi‑4 models demonstrated reliable perception and grounding across images and text, while later iterations introduced structured reasoning to improve performance on complex tasks. Phi‑4‑reasoning-vision-15B brings these threads together, pairing high‑resolution visual perception with selective, task‑aware reasoning. As a result, the model can reason deeply when needed while remaining fast and efficient for perception‑focused scenarios—making it well suited for interactive, real‑world applications. Key capabilities Reasoning behavior is explicitly enabled via prompting: Developers can explicitly enable or disable reasoning to balance latency and accuracy at runtime. Optimized for vision reasoning and can be used for: diagram-based math, document, chart, and table understanding, GUI interpretations and grounding for agent scenarios to interpret screens and actions, Computer-use agent scenarios, and General image chat and answering questions Benchmarks The following results summarize Phi-4-reasoning-vision-15B performance across a set of established multimodal reasoning, mathematics, and computer use benchmarks. The following benchmarks are the result of internal evaluations. Benchmark Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B – force no think Phi-4-mm-instruct Kimi-VL-A3B-Instruct gemma-3-12b-it Qwen3-VL-8B-Instruct-4K Qwen3-VL-8B-Instruct-32K Qwen3-VL-32B-Instruct-4K Qwen3-VL-32B-Instruct-32K AI2D _TEST 84.8 84.7 68.6 84.6 80.4 82.7 83 84.8 85 ChartQA _TEST 83.3 76.5 23.5 87 39 83.1 83.2 84.3 84 HallusionBench 64.4 63.1 56 65.2 65.3 73.5 74.1 74.4 74.9 MathVerse _MINI 44.9 43.8 32.4 41.7 29.8 54.5 57.4 64.2 64.2 MathVision _MINI 36.2 34.2 20 28.3 31.9 45.7 50 54.3 60.5 MathVista _MINI 75.2 68.7 50.5 67.1 57.4 77.1 76.4 82.5 81.8 MMMU _VAL 54.3 52 42.3 52 50 60.7 64.6 68.6 70.6 MMStar 64.5 63.3 45.9 60 59.4 68.9 69.9 73.7 74.3 OCRBench 76 75.6 62.6 86.5 75.3 89.2 90 88.5 88.5 ScreenSpot _v2 88.2 88.3 28.5 89.8 3.5 91.5 91.5 93.7 93.9 Table 1: Accuracy comparisons relative to popular open-weight, non-thinking models Benchmark Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B - force thinking Kimi-VL-A3B-Thinking gemma-3-12b-it Qwen3-VL-8B-Thinking-4K Qwen3-VL-8B-Thinking-40K Qwen3-VL-32B-Thiking-4K Qwen3-VL-32B-Thinking-40K AI2D_TEST 84.8 79.7 81.2 80.4 83.5 83.9 86.9 87.2 ChartQA _TEST 83.3 82.9 73.3 39 78 78.6 78.5 79.1 HallusionBench 64.4 63.9 70.6 65.3 71.6 73 76.4 76.6 MathVerse _MINI 44.9 53.1 61 29.8 67.3 73.3 78.3 78.2 MathVision _MINI 36.2 36.2 50.3 31.9 43.1 50.7 60.9 58.6 MathVista _MINI 75.2 74.1 78.6 57.4 77.7 79.5 83.9 83.8 MMMU _VAL 54.3 55 60.2 50 59.3 65.3 72 72.2 MMStar 64.5 63.9 69.6 59.4 69.3 72.3 75.5 75.7 OCRBench 76 73.7 79.9 75.3 81.2 82 83.7 85 ScreenSpot _v2 88.2 88.1 81.8 3.5 93.3 92.7 83.1 83.1 Table 2: Accuracy comparisons relative to popular open-weight, thinking models All results were obtained using a consistent evaluation setup and prompts across models; numbers are provided for comparison and analysis rather than as leaderboard claims. For more information regarding benchmarks and evaluations, please read the technical paper on the Microsoft Research hub. Suggested use cases and applications Phi‑4‑Reasoning-Vision-15B supports applications that require both high‑fidelity visual perception and structured inference. Two representative scenarios include scientific and mathematical reasoning over visual inputs, and computer‑using agents (CUAs) that operate directly on graphical user interfaces. In both cases, the model provides grounded visual understanding paired with controllable, low‑latency reasoning suitable for interactive systems. Computer use agents in retail scenarios For computer use agents, Phi‑4‑Reasoning-Vision-15B provides the perception and grounding layer required to understand and act within live ecommerce interfaces. For example, in an online shopping experience, the model interprets screen content—products, prices, filters, promotions, buttons, and cart state—and produces grounded observations that agentic models like Fara-7B can use to select actions. Its compact size and low latency inference make it well suited for CUA workflows and agentic applications. Visual reasoning for education Another practical use of visual reasoning models is education. A developer could build a K‑12 tutoring app with Phi‑4‑Reasoning‑Vision‑15B where students upload photos of worksheets, charts, or diagrams to get guided help—not answers. The model can understand the visual content, identify where the student went wrong, and explain the correct steps clearly. Over time, the app can adapt by serving new examples matched to the student’s learning level, turning visual problem‑solving into a personalized learning experience. Microsoft Responsible AI principles At Microsoft, our mission to empower people and organizations remains constant—especially in the age of AI, where the potential for human achievement is greater than ever. We recognize that trust is foundational to AI adoption, and earning that trust requires a commitment to transparency, safety, and accountability. As with other Phi models, Phi-4-Reasoning-Vision-15B was developed with safety as a core consideration throughout training and evaluation. The model was trained on a mixture of public safety datasets and internally generated examples designed to elicit behaviors the model should appropriately refuse, in alignment with Microsoft’s Responsible AI Principles. These safety focused training signals help the model recognize and decline requests that fall outside intended or acceptable use. Additional details on the model’s safety considerations, evaluation approach, and known limitations are provided in the accompanying technical blog and model card. Getting started Start using Phi‑4‑Reasoning-Vision-15B in Microsoft Foundry today. Microsoft Foundry provides a unified environment for model discovery, evaluation, and deployment, making it straightforward to move from initial experimentation to production use while applying appropriate safety and governance practices. Deploy the new model on Microsoft Foundry. Learn more about the Phi family on Foundry Labs and in the Phi Cookbook Connect to the Microsoft Developer Community on Discord Read the technical paper on Microsoft Research Read more use cases on the Educators Developer blog968Views0likes0CommentsGrok 4.0 Goes GA in Microsoft Foundry and Grok 4.1 Fast Arrives with Major Enhancements
We first brought Grok 4.0 to Microsoft Foundry in September 2025, marking an important milestone in expanding Foundry’s multi-model ecosystem with frontier reasoning models from xAI. Since then, customer interest and usage have continued to build as developers explored Grok’s strengths in fast reasoning, sense-making, and interpretation of complex, ambiguous information. Today, we’re excited to announce that Grok 4.0 is now generally available (GA) in Microsoft Foundry, giving enterprises a production-ready path to deploy Grok at scale. Building on that momentum, Grok 4.1 Fast (Reasoning and Non-reasoning) are now available in Microsoft Foundry. Grok 4.1 introduces a suite of improvements that enhance conversation quality, creativity, and emotional intelligence while maintaining core reasoning strengths. According to xAI, Grok 4.1 delivers more natural, fluid dialogue compared with earlier versions. Introducing Grok 4.1 Fast (Reasoning and Non-Reasoning) Grok 4.1 Fast is optimized for speed, scale, and agentic execution, giving developers flexibility to choose between reasoning and non-reasoning variants depending on workload requirements. Grok 4.1 Fast (Reasoning): Designed for scenarios that require rapid multi-step reasoning, structured decision-making, and interpretation of complex inputs. This variant is well-suited for agent workflows, analysis pipelines, and applications that need fast responses without sacrificing reasoning depth. Grok 4.1 Fast (Non-Reasoning): Optimized for maximum throughput and low latency, this variant is ideal for tasks such as summarization, classification, content transformation, and tool-driven execution where deterministic speed and efficiency matter more than deep reasoning. Together, these options allow teams to right-size performance and cost by selecting the appropriate Grok 4.1 Fast variant for each stage of an application from high-volume preprocessing and orchestration to targeted reasoning tasks. What’s New with Grok 4.1 Fast? Grok 4.1 brings several enhancements that broaden the model’s applicability and user experience: Improved Conversational Quality: According to xAI, Grok 4.1 Fast offers smoother, more natural interaction patterns, making it more comfortable and intuitive to engage with, especially in multi-turn dialogues. Enhanced Creativity and Emotional Awareness: According to xAI, Grok 4.1 Fast demonstrates stronger creative writing capabilities and greater emotional intelligence, helping it generate more expressive and engaging outputs that better align with human expectations. Reduced Hallucination and Better Reliability: According to xAI, Grok 4.1 Fast produces fewer factual inaccuracies than its predecessor These enhancements make Grok 4.1 Fast a compelling choice for use cases that require engaging conversational interfaces, creative support, and rich natural language interaction. As with all frontier AI models, Grok-4.1 Fast introduces new capabilities alongside new operational considerations. Microsoft’s safety and responsible AI evaluations indicate that Grok-4.1 Fast may demonstrate increased risks in safety testing compared with other models available through Azure. In practice, this means there may be an increased risk of generating explicit or potentially harmful content. To support responsible deployment, customers should implement system-level safety instructions and leverage Azure AI Content Safety (AACS) to help monitor and filter outputs. Because no single safety system can address every possible risk scenario, customers are encouraged to conduct their own evaluations and validation before deploying Grok-4.1 in production systems. To provide enhanced safety and enterprise reliability, Microsoft's deployment of Grok 4.1 features a system-applied safety prompt that cannot be disabled. Customers are expected to operate the model without attempting to bypass or interfere with this feature. Enterprise-Ready Deployment via Microsoft Foundry With Grok 4.0 now GA in Foundry, enterprises gain the ability to incorporate advanced reasoning models into their workflows while enjoying the governance, compliance, and operational tooling that Azure provides. Models hosted in Foundry can be deployed serverless or with provisioned throughput, and customers benefit from centralized billing, identity integration, and access to other Azure services. Foundry’s model catalog also includes other Grok variants such as Grok 4.1 Fast and related non-reasoning SKUs, giving enterprises flexibility to balance performance, latency, and cost depending on their workloads. Pricing Model Deployment Input/1M Tokens Output/1M Tokens Availability Grok 4.1 Fast (Non-Reasoning) Global Standard $0.2 $0.5 Public Preview on 2/27/2026 Grok 4.1 Fast (Reasoning) Global Standard $0.2 $0.5 Public Preview on 3/4/2026 Looking Ahead The combination of Grok’s deep reasoning capabilities with the enterprise readiness of Microsoft Foundry opens new possibilities for production AI applications, from complex analytical agents and research assistants to creative and customer-facing experiences. With Grok 4.1’s conversational refinements further raising the model’s usability and expressiveness, Foundry customers can now experiment with and scale a broader set of AI-driven solutions, all within a trusted, governed environment. As Microsoft continues to expand Foundry’s catalog and partners like xAI continue to innovate, organizations have more options than ever to power next-generation AI applications across industries, use cases, and domains. Try Grok 4.1 Non-Reasoning <AI Model Catalog | Microsoft Foundry Models> Reasoning <AI Model Catalog | Microsoft Foundry Models>816Views0likes0CommentsAnnouncing GPT‑5.2‑Codex in Microsoft Foundry: Enterprise‑Grade AI for Secure Software Engineering
Enterprise developers know the grind: wrestling with legacy code, navigating complex dependency challenges, and waiting on security reviews that stall releases. OpenAI’s GPT‑5.2‑Codex flips that equation and helps engineers ship faster without cutting corners. It’s not just autocomplete; it’s a reasoning engine for real-world software engineering. Generally available starting today through Azure OpenAI in Microsoft Foundry Models, GPT‑5.2‑Codex is built for the realities of enterprise codebases, large repos, evolving requirements, and security constraints that can’t be overlooked. As OpenAI’s most advanced agentic coding model, it brings sustained reasoning, and security-aware assistance directly into the workflows enterprise developers already rely on with Microsoft’s secure and reliable infrastructure. GPT-5.2-Codex at a Glance GPT‑5.2‑Codex is designed for how software gets built in enterprise teams. You start with imperfect inputs including legacy code, partial docs, screenshots, diagrams, and work through multi‑step changes, reviews, and fixes. The model helps keep context, intent, and standards intact across that entire lifecycle, so teams can move faster without sacrificing quality or security. What it enables Work across code and artifacts: Reason over source code alongside screenshots, architecture diagrams, and UI mocks — so implementation stays aligned with design intent. Stay productive in long‑running tasks: Maintain context across migrations, refactors, and investigations, even as requirements evolve. Build and review with security in mind: Get practical support for secure coding patterns, remediation, reviews, and vulnerability analysis — where correctness matters as much as speed. Feature Specs (quick reference) Context window: 400K tokens (approximately 100K lines of code) Supported languages: 50+ including Python, JavaScript/TypeScript, C#, Java, Go, Rust Multimodal inputs: Code, images (UI mocks, diagrams), and natural language API compatibility: Drop-in replacement for existing Codex API calls Use cases where it really pops Legacy modernization with guardrails: Safely migrate and refactor “untouchable” systems by preserving behavior, improving structure, and minimizing regression risk. Large‑scale refactors that don’t lose intent: Execute cross‑module updates and consistency improvements without the typical “one step forward, two steps back” churn. AI‑assisted code review that raises the floor: Catch risky patterns, propose safer alternatives, and improve consistency, especially across large teams and long‑lived codebases. Defensive security workflows at scale: Accelerate vulnerability triage, dependency/path analysis, and remediation when speed matters, but precision matters more. Lower cognitive load in long, multi‑step builds: Keep momentum across multi‑hour sessions: planning, implementing, validating, and iterating with context intact. Pricing Model Input Price/1M Tokens Cached Input Price/1M Tokens Output Price/1M Tokens GPT-5.2-Codex $1.75 $0.175 $14.00 Security Aware by Design, not as an Afterthought For many organizations, AI adoption hinges on one nonnegotiable question: Can this be trusted in security sensitive workflows? GPT-5.2-Codex meaningfully advances the Codex lineage in this area. As models grow more capable, we’ve seen that general reasoning improvements naturally translate into stronger performance in specialized domains — including defensive cybersecurity. With GPT‑5.2‑Codex, this shows up in practical ways: Improved ability to analyze unfamiliar code paths and dependencies Stronger assistance with secure coding patterns and remediation More dependable support during code reviews, vulnerability investigations, and incident response At the same time, Microsoft continues to deploy these capabilities thoughtfully balancing access, safeguards, and platform level controls so enterprises can adopt AI responsibly as capabilities evolve. Why Run GPT-5.2-Codex on Microsoft Foundry? Powerful models matter — but where and how they run matters just as much for enterprise. Organizations choose Microsoft Foundry because it combines Foundry frontier AI with Azure enterprise grade fundamentals: Integrated security, compliance, and governance Deploy GPT-5.2-Codex within existing Azure security boundaries, identity systems, and compliance frameworks — without reinventing controls. Enterprise ready orchestration and tooling Build, evaluate, monitor, and scale AI powered developer experiences using the same platform teams already rely on for production workloads. A unified path from experimentation to scale Foundry makes it easier to move from proof of concept to real deployment —without changing platforms, vendors, or operating assumptions. Trust at the platform level For teams working in regulated or security critical environments, Foundry and Azure provide assurances that go beyond the model itself. Together with GitHub Copilot, Microsoft Foundry provides a unified developer experience — from in‑IDE assistance to production‑grade AI workflows — backed by Azure’s security, compliance, and global scale. This is where GPT-5.2-Codex becomes not just impressive but adoptable. Get Started Today Explore GPT‑5.2‑Codex in Microsoft today. Start where you already work: Try GPT‑5.2‑Codex in GitHub Copilot for everyday coding and scale the same model to larger workflows using Azure OpenAI in Microsoft Foundry. Let’s build what’s next with speed and security.16KViews3likes1CommentUnlocking Document Understanding with Mistral Document AI in Microsoft Foundry
Enterprises today face a familiar yet formidable challenge: mountains of documents -contracts, invoices, reports, forms - remain locked in unstructured formats. Traditional OCR (optical character recognition) captures text, but often struggles with context, layout complexity, or multilingual content. The result? Slow workflows, error-prone manual reviews, and missed insights. Enter mistral-document-ai-2512 in Microsoft Foundry. This new model brings together high-end OCR using mistral-ocr-2512 and intelligent document understanding using mistral-small-2506 to turn unstructured documents into actionable data. It doesn’t just “read” pages - it understands them: multi-column layouts, handwritten annotations, tables with merging cells, multilingual content-all processed with enterprise-grade speed and precision. In this blog, we’ll explore what Mistral Document AI 2512 is, why it matters, how it stacks up, and the business impact it promises, especially when paired with solution accelerators like ARGUS. Meet Mistral Document AI Mistral Document AI is an enterprise-grade document understanding model, offered via Microsoft Foundry. It’s built to convert both physical (scans, photos) and digital (PDFs, DOCX) documents into highly structured, machine-readable outputs. Key features include: Top-tier accuracy: According to benchmarks, Mistral’s OCR 2512 stacks display significantly higher accuracy than many alternatives, especially on scanned documents and complex layouts. For example, in comparisons it achieved ~95.9 % “overall” vs ~89-91 % for other platforms Global / multilingual reach: In language-by-language tests (Russian, French, German, Spanish, Chinese, etc), Mistral’s error-rate/fuzzy-match metrics reached 99 %+ in many cases Layout & context awareness: It’s built to not just extract linear text but understand multi-column layouts, tables, charts, images, handwritten input and more Structured output functionality: The model supports structured extraction (JSON), markup (Markdown with interleaved images), preserving document structure for downstream systems Enterprise-ready deployment: With availability via Microsoft Foundry and support for private/secure inference, the model is geared for regulated industries and high-volume workflows Putting it another way: where traditional OCR stops at “here’s the raw text on page 7”, Mistral DocumentAI 2512 can say “here’s the vendor invoice, here are line-items, here’s the total, here’s the signature block, and here’s the part that was handwritten”, ready to plug into downstream systems. Business Impact & Industry examples Mistral Document AI isn’t just another OCR tool; it’s a strategic enabler that turns document-heavy operations into intelligent, automated workflows. The business value comes down to four key advantages: Speed and efficiency: Automating document understanding eliminates manual reviews and retyping. Tasks that took days can be done in minutes, accelerating core business processes Accuracy and consistency: With 99 %+ recognition accuracy and deep layout understanding, Mistral delivers cleaner data and fewer downstream errors - essential in compliance-critical or analytics-driven operations Cost and productivity gains: Reducing manual extraction frees teams for higher-value work, cutting operational costs while increasing output per employee Scalability and adaptability: Cloud-native performance allows organizations to scale document processing instantly during peak loads, across multiple languages and formats, without sacrificing quality Overall, mistral-document-ai-2512 excels where consistency and quality are critical. Industry and Use Cases In regulated industries or big-data scenarios, even a small improvement in accuracy or speed can translate into substantial business gains. Its benchmarks indicate not just incremental progress, but a major step forward - giving enterprises a powerful new engine for their document workflows. Here’s where that impact becomes tangible: Financial services: Banks and insurers handle vast document volumes - loan applications, KYC forms, and claims reports - where data integrity and auditability are non-negotiable. Mistral automates extraction, classification, and clause identification across diverse formats, improving turnaround time and compliance accuracy while reducing manual handling costs Healthcare & life sciences: Clinical records, lab results, and insurance claims often combine handwritten, tabular, and multi-language content. Mistral’s layout awareness and multilingual support ensure clean, structured datasets for downstream analytics and regulatory submissions Manufacturing & logistics: From quality certificates to shipping manifests, Mistral streamlines the flow of operational documents. It can extract production parameters, vendor data, and timestamps at scale - building a unified, queryable data layer that supports supply chain traceability Legal & public sector: Legal teams and agencies depend on consistency and transparency. Mistral helps index, summarise, and validate contracts or permits with full structural fidelity - dramatically cutting review cycles while maintaining evidential quality Retail & consumer goods: Retailers process supplier invoices, product specifications, and marketing briefs from global partners. With Mistral’s multilingual precision and structure preservation, global document flows become searchable and analytics-ready Across these industries, the result is the same: cleaner data, faster throughput, and fewer human errors - the foundation for more reliable decisions and more agile operations. Pricing Argus – A ready-to-implement accelerator to start using Mistral Document AI To spin up a solution faster, one can leverage solution accelerators such as ARGUS (open-source repository available on GitHub). ARGUS serves as a full-pipeline implementation: from document ingestion, OCR/extraction (via Mistral Document AI), to downstream processing and structured output. It shows how to deploy end-to-end, integrate with storage, preprocess documents, handle large-scale batches, output JSON schemas, and integrate into existing business workflows. Mistral Document AI Integration ARGUS now offers flexible OCR provider selection with Mistral Document AI as one of the several options. This enhancement gives you the freedom to choose the best OCR engine for your specific document processing needs. Key Features: Dual Provider Support: Toggle between Azure Document Intelligence (default) and Mistral Document AI Runtime Switching: Change OCR providers on-the-fly through the Settings UI without redeployment Simple Configuration: Set up Mistral via environment variables (OCR_PROVIDER, MISTRAL_DOC_AI_ENDPOINT, MISTRAL_DOC_AI_KEY) or the web interface Seamless Integration: Both providers expose the same interface, ensuring consistent behavior across your document processing pipeline Why This Matters: Different OCR engines excel at processing different document content. Azure Document Intelligence offers enterprise-grade form and table recognition, while Mistral Document AI 2512, in addition, enables extraction to structured JSON with customizable schemas, document classification, and image processing—including text, charts, and signatures. It can convert charts into tables, extract fine print from figures, and even define custom image types for specialized workflows. Now you can select the optimal provider for each use case. In effect, instead of building from scratch, ARGUS gives you the legs to run: pipeline orchestration, ingestion, error-handling, schema-mapping, output integration-all wired to Mistral’s engine. This significantly accelerates time-to-value and reduces risk for enterprise adopters. Getting Started: Navigate to the ARGUS frontend interface (Streamlit app) and click on the Settings tab. In the OCR Provider Configuration section, select your preferred provider. If using Mistral, enter your endpoint URL, API key, and model name. Click Update OCR Provider to apply changes immediately—no restart required. All new document processing will use your selected OCR engine. If your organization is looking to unlock document intelligence, here’s a structured path: Explore Mistral Document AI via Microsoft Foundry: Browse the model card, review endpoint specs, try sample documents to test accuracy and extraction structure Deploy and Pilot with ARGUS: Use the GitHub repo to spin up an end-to-end pipeline on a small workload (e.g., a batch of invoices or contracts) and compare manual vs AI-driven throughput and error-rates Define business value metrics: Track processing time, error rate, manual hours saved, and downstream impact (faster decision cycles, fewer reworks). Scale and govern: Once pilot proves value, expand into multiple document types, languages, geographies - and ensure governance (data handling, compliance, model-monitoring) Embed continuous improvement: As usage grows, feed back learnings, tune schema definitions, refine extraction rules, and extend into QA, insights or analytics layers Conclusion In today’s data-rich but document-heavy environment, the ability to truly understand documents (and not just digitize them) is becoming a strategic imperative. Mistral Document AI represents a next-generation shift: accurate, layout-aware, multilingual, structured. When paired with accelerators like ARGUS, enterprises can move from manual bottlenecks to streamlined, insight-rich document workflows. If you’re thinking about unlocking the value buried in your documents-be it invoices, contracts, forms or reports, now is the time. With mistral-document-ai-2512, what used to be a cost-center is now a potential performance lever. Ready to get started? Explore the model, and let your documents begin talking back.4.9KViews2likes0CommentsAnnouncing extended support for Fine Tuning gpt-4o and gpt-4o-mini
At Build 2025, we announced post-retirement, extended deployment and inference support for fine tuned models. Today, we’re excited to announce we’re extending fine-tuning training for current customers of our most popular Azure OpenAI models: gpt-4o (2024-08-06) and gpt-4o-mini (2024-07-18). Hundreds of customers have pushed trillions of tokens through fine-tuned versions of these models and we’re happy to provide even more runway for your AI agents and applications. Already using these models in Foundry? We have you covered as the only provider of fine tuning gpt-4o and gpt-4o-mini come April. Keep fine tuning! Not yet using Microsoft Foundry? Get started today by migrating your training data to Microsoft Foundry and fine tune using Global or Standard Training for gpt-4o and gpt-4o-mini using your existing OpenAI code. You’ll have the runway to continuously fine tune or update your models. You have until March 31 st , 2026, to become a fine-tuning customer of these models. Model Version Training retirement date Deployment retirement date gpt-4o 2024-08-06 No earlier than 2026-09-31 1 2027-03-31 gpt-4o-mini 2024-07-18 No earlier than 2026-09-31 1 2027-03-31 gpt-4.1 2025-04-14 At base model retirement One year after training retirement gpt-4.1-mini 2025-04-14 At base model retirement One year after training retirement gpt-4.1-nano 2025-04-14 At base model retirement One year after training retirement o4-mini 2025-04-16 At base model retirement One year after training retirement 1 For existing customers only. Otherwise, training retirement occurs at base model retirement320Views0likes0CommentsMeet FLUX.2 [flex] for Text‑Heavy Design and UI Prototyping Now Available on Microsoft Foundry
Within the FLUX.2 family, FLUX.2 [flex] is the specialist for text and graphic design production, while FLUX.2 [pro] is optimized for photorealistic and cinematic-quality outputs. Use FLUX.2 [flex] when typography, logos, UI copy, or on-brand visual accuracy is the priority. Model Overview FLUX.2 [flex] delivers fine-grained control for precise results. It is purpose-built for typography and text-forward workflows, maintaining small details in complex scenes while enabling rapid iteration at scale. Teams use it for production-grade brand design, product UI prototyping, and any workflow where accurate text rendering is critical. This positioning makes FLUX.2 [flex] ideal for platform builders and creative teams who need to balance speed, scale, and cost while maintaining creative control. FLUX.2 [flex] is now available as gated Public Preview. Fill out this form Black Forest Lab's Flux models Limited Access Applications to get access to these models today. Key Capabilities Best-in-class text rendering for logos, UI copy, product packaging, and social graphics Fine-grained control - Adjust inference steps and guidance scale to dial in precise outputs for production workflows Detail preservation - Maintains fine details and small elements in complex scenes, keeping brand assets sharp at any resolution Multi-reference editing - Supports up to 8 reference images via API for complex multi-image compositing workflows Use Cases FLUX.2 [flex] is specialized for workflows where accurate text rendering and fine-grained visual control are non-negotiable. It excels across brand design, product UI prototyping, social graphics, and any project requiring readable typography, precise layouts, or clean on-brand assets. Brand and Graphic Design Generate precise, on-brand graphic design assets with clean, readable typography across logos, packaging, and marketing collateral Iterate across typefaces, layout styles, and color systems without sacrificing text accuracy or fine detail Produce production-ready social graphics and campaign visuals with correctly rendered copy Creative & Media Production Render multi-word headlines, body copy blocks, and editorial layouts that adapt seamlessly to any style. Modern, classic, handwritten, or dynamic 3D lettering Generate magazine covers, editorial spreads, and text-heavy visual layouts with correctly spaced, readable fonts at production quality Test copywriting concepts and headline variations at 3x the speed of previous-generation models Product UI Prototyping Rapidly prototype product UIs with rich text components, accurate button labels, navigation copy, and on-screen text Build and iterate on app screens, dashboards, and design systems where text legibility and layout precision are critical 💡Tip: Use FLUX.2 [pro] in Microsoft Foundry when your workflow prioritizes photorealistic imagery or cinematic-quality final renders, rather than text-forward or graphic design outputs. Join Black Forest Labs on Model Mondays YouTube Livestream: March 2, 2026 Stephen Batifol, Developer Advocate at Black Forest Labs, joins Model Mondays to show how teams use BFL’s generative image models on Microsoft Foundry to move from experimentation to real‑world creative workflows at scale with fewer touch‑ups, stronger brand fidelity, and state‑of‑the‑art image quality. This hands‑on session features a live demo and practical patterns for building scalable image pipelines, so you can ship complete creative campaigns in days, not weeks. RSVP Here Pricing Foundry Models are fully hosted and managed on Azure. FLUX.2 [flex] is available through pay-as-you-go and on Global Standard deployment type with the following pricing: Model Type Pricing FLUX.2-flex Global Standard $0.05 per megapixel (MP) Important Notes: For pricing, resolution is always rounded up to the next megapixel, separately for each reference image and for the generated image. 1 megapixel is counted as 1024x1024 pixels For multiple reference images, each reference image is counted as 1 megapixel Images exceeding 4 megapixels are resized to 4 megapixels Reference the Foundry Models pricing page for pricing. Build Trustworthy AI Solutions Black Forest Labs models in Foundry Models are delivered under the Microsoft Product Terms, giving you enterprise-grade security and compliance out of the box. Each FLUX endpoint offers Content Safety controls and guardrails. Runtime protections include built-in content-safety filters, role-based access control, virtual-network isolation, and automatic Azure Monitor logging. Governance signals stream directly into Azure Policy, Purview, and Microsoft Sentinel, giving security and compliance teams real-time visibility. Together, Microsoft's capabilities let you create with more confidence, knowing that privacy, security, and safety are woven into every Black Forest Labs deployment from day one. Getting Started with FLUX.2 in Microsoft Foundry If you don’t have an Azure subscription, you can sign up for an Azure account here. Search for the model name in the model catalog in Microsoft Foundry under “Build.” FLUX.2-flex Open the model card in the model catalog. Click on deploy to obtain the inference API and key. View your deployment under Build > Models. You should land on the deployment page that shows you the API and key in less than a minute. You can try out your prompts in the playground. You can use the API and key with various clients. Learn More FLUX.2 [flex] is now available as gated Public Preview. Fill out this form Black Forest Lab's Flux models Limited Access Applications to get access to these models today. ▶️ RSVP for the next Model Monday LIVE on YouTube or On-Demand 👩💻 Explore FLUX.2 Documentation on Microsoft Learn 👋 Continue the conversation on Discord492Views0likes0CommentsNew Azure Open AI models bring fast, expressive, and real‑time AI experiences in Microsoft Foundry
Modern AI applications, whether voice‑first experiences or building large software systems, rarely fit into a single prompt. Real work unfolds over time: maintaining context, following instructions, invoking tools, and adapting as requirements evolve. When these foundations break down through latency spikes, instruction drift, or unreliable tool calls, both user conversations and developer workflows are impacted. OpenAI’s latest models address this shared challenge by prioritizing continuity and reliability across real‑time interaction and long‑running engineering tasks. Starting today, GPT-Realtime-1.5, GPT-Audio-1.5, and GPT-5.3-Codex are rolling out into Microsoft Foundry. Together, these models reflect the growing needs of the modern developer and push the needle from short, stateless interactions toward AI systems that can reason, act, and collaborate over time. GPT-5.3-Codex at a glance GPT‑5.3‑Codex brings together advanced coding capability with broader reasoning and professional problem solving in a single model built for real engineering work. It unifies the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT5.2 in one system. This shifts the experience from optimizing isolated outputs to supporting longer running development efforts; where repositories are large, changes span multiple steps, and requirements aren’t always fully specified at the start. What’s improved Model experiences 25% faster execution time, according to Open AI, than its predecessors so developers can accelerate development of new applications. Built for long-running tasks that involve research, tool use, and complex, multi‑step execution while maintaining context. Midtask steerability and frequent updates allow developers to redirect and collaborate with the model as it works without losing context. Stronger computer-use capabilities allow developers to execute across the full spectrum of technical work. Common use cases Developers and teams can apply GPT‑5.3‑Codex across a wide range of scenarios, including: Refactoring and modernizing large or legacy applications Performing multi‑step migrations or upgrades Running agentic developer workflows that span analysis, implementation, testing, and remediation Automating code reviews, test generation, and defect detection Supporting development in security‑sensitive or regulated environments Pricing Model Input Price/1M Tokens Cached Input Price/1M Tokens Output Price/1M Tokens GPT-5.3-Codex $1.75 $0.175 $14.00 GPT-Realtime-1.5 and GPT-Audio-1.5 at a glance The models deliver measurable gains in reasoning and speech understanding for real‑time voice interactions on Microsoft Foundry. In OpenAI’s evaluations, it shows a +5% lift on Big Bench Audio (reasoning), a +10.23% improvement in alphanumeric transcription, and a +7% gain in instruction following, while maintaining low‑latency performance. Key improvements include: What's improved More natural‑sounding speech: Audio output is smoother and more conversational, with improved pacing and prosody. Higher audio quality: Clearer, more consistent audio output across supported voices. Improved instruction following: Better alignment with developer‑provided system and user instructions during live interactions. Function calling support: Enables structured, tool‑driven interactions within real‑time audio flows. Common use cases Developers are using GPT-Realtime-1.5 and GPT-Audio-1.5 for scenarios where low‑latency voice interaction is essential, including: Conversational voice agents for customer support or internal help desks Voice‑enabled assistants embedded in applications or devices Live voice interfaces for kiosks, demos, and interactive experiences Hands‑free workflows where audio input and output replace keyboard interaction Pricing Model Text Audio Image Input Cached Input Output Input Cached Input Output Input Cached Input Output GPT-Realtime-1.5 $4.00 $0.04 $16.0 $32.0 $0.40 $64.00 $4.00 $0.04 $16.0 GPT-Audio-1.5 $2.50 n/a $10.0 $32.00 n/a $64.00 $2.50 n/a $10.0 Getting started in Microsoft Foundry Start building in Microsoft Foundry, evaluate performance, and explore Azure Open AI models today. Foundry brings evaluation, deployment, and governance into a single workflow, helping teams progress from experiments to scalable applications while maintaining security and operational controls.9.5KViews1like0Comments