Foundry Models at Ignite 2025: Why Integration Wins in Enterprise AI

Microsoft

Nov 18, 2025

New models, model router, priority processing, fine tuning and more

As the AI landscape accelerates, organizations have outgrown siloed tools and limited model access. They now expect the best-performing AI models on their platform of choice — not just to prototype, but to power intelligent applications and autonomous agents that deliver real business outcomes. The problem is no longer finding powerful models; it’s integrating them seamlessly with enterprise data, systems, and workflows in a secure, governed, and scalable way.

Microsoft Foundry addresses this challenge head-on. It’s the unified platform that brings together the widest selection of models on any cloud with the enterprise-grade infrastructure needed to run them responsibly at scale. With over 11,000+ models, Foundry enables organizations to discover, evaluate, and operationalize AI with built-in safety, observability, and governance transforming fragmented experiments into integrated, production-ready AI systems.

This year, we’re strengthening that foundation with four major innovations:

New models from Anthropic, Cohere, NVIDIA, and more — expanding access to cutting-edge reasoning, multimodality, and creative intelligence.

Model router (GA) — delivering intelligent workload routing, performance benchmarking, and cost optimization across models.

Priority Processing (Public Preview) — providing SLA-backed, low-latency inference for mission-critical workloads.

Fine-tuning (Preview) — offering faster training, richer data integration, and broader model support in a unified experience.

Together, these capabilities move enterprises beyond experimentation towards true AI system integration — where models, data, and workflows operate in harmony to deliver measurable ROI and sustainable competitive advantage.

From Model Selection to System Integration

At Microsoft Ignite 2025, we’re showcasing how this foundation accelerates innovation end-to-end — with integrated content safety, flexible deployment options, and the governance and interoperability enterprises need to scale AI confidently and responsibly.

Leverage the widest selection of models on any cloud

Our vision for Foundry Models is simple: bring the world’s best intelligence into one trusted platform — from frontier breakthroughs to specialized, domain-specific models — so enterprises can choose the right model for every workload without compromise. 2025

This week at Microsoft Ignite 2025, that vision takes a major step forward. Customers can now access Anthropic’s Claude models in Microsoft Foundry, unlocking deep-reasoning capabilities designed for the realities of enterprise development — from multi-document research to large-scale code understanding and tight integration with productivity tools.

We are also excited to have Cohere’s Command A, Embed 4 and Rerank models coming soon to the direct from Azure models, unlocking a new class of enterprise-ready language models built for retrieval-augmented generation, knowledge workflows, and domain-specific reasoning. As a direct from Azure model, Cohere is deployed on Azure’s secure, compliant infrastructure — giving customers low-latency inference, consistent SLAs, unified observability, and seamless integration with enterprise data systems. This approach provides organizations with all the benefits of Cohere's powerful models alongside the governance, security, and operational tooling of Azure AI Foundry — making it easier for developers to build high-trust, high-impact AI applications at scale.

We’re also expanding our support for OpenAI’s newest innovations, GPT-5.1 series and the latest Codex models along with FLUX.2 from Black Forest Labs coming soon. These frontier models complement our growing library of domain-specific intelligence — from new NVIDIA NIM microservices that push the boundaries of AI development and deployment with Llama 3.3 Nemotron Super49B v1.5 to Boltz-2, and Evo 2 40B driving breakthroughs in drug discovery and life sciences. We now also have new managed compute models from NVIDIA, MongoDB’s Voyage AI, RosettaFold3, Inception AI Labs expanding Foundry’s reach across edge intelligence, protein sequencing and more.

Across reasoning, retrieval, code, science, and creative generation, Foundry brings all of this innovation together in a unified, governed platform. The result: one of the broadest and most trusted AI ecosystems — giving enterprises freedom of choice, with the security, observability, and cost transparency required to operate at scale.

Model Router (GA) — Intelligence That Simplifies Choice

Choosing the right model no longer must be complex. Starting today, the model router in Microsoft Foundry is now generally available. Model router automatically directs prompts to the optimal model based on real-time benchmarking across latency, accuracy, and cost. As seen in early customer deployments, the model router can deliver 50 percent lower latency while improving quality by up to 15% across workloads — all without code changes, or loss in quality.

“We saw a dramatic and immediate reduction in end-user latency, from a volatile 10–20 seconds down to a stable 3–5 seconds. This wasn’t just a marginal improvement; it was a greater than 50% reduction in average response time, which directly translates to a better user experience.”

- Gustavo Azambuja, Senior AI Architect, Perficient

Today, we are pleased to announce model router now supports a broader range of models including: GPT-4 family, GPT-5 family, gpt-oss, Deepseek-v3.1, Llama-4-maverick-instruct, Grok-4, Grok-4-fast. This expanded catalog allows the model router to route requests intelligently across models to achieve the best balance of cost, latency, and accuracy.

Developers can now fine-tune routing behavior with three distinct modes:

Cost Saving: Prioritizes lightweight models for simpler tasks, reducing costs without compromising baseline accuracy.

Quality: Routes input to the optimal models for maximum precision—ideal for high-stakes domains like legal, healthcare, or finance.

Balanced: Dynamically optimizes cost and quality for general-purpose workloads, delivering the best of both worlds.

Along with new model subset options, these modes make it easier to align AI performance with business priorities while maintaining flexibility and control. Now integrated into Foundry Agent Service, the model router empowers developers to build sophisticated, multi-modal conversational experiences without complex code. It’s model choice made simple, smart, and scalable.

Priority Processing — Performance When It Matters Most

Every enterprise runs AI differently — and Microsoft Foundry is designed to meet those needs with deployment options that align to each customer’s performance, compliance, and cost requirements. Customers can run models globally for maximum availability, in Data Zones (US or EU) for residency needs, or in specific regions when local proximity is essential.

Within each location, Foundry offers a full range of service levels:

Standard (pay-as-you-go): flexible, fast inference with dynamic scaling.

Provisioned Throughput (PTU): guaranteed performance and predictable, reserved cost.

Batch Deployments: asynchronous processing at up to 50% lower cost than Global Standard for workloads that can wait.

New at Ignite, we’re introducing Priority Processing (Public Preview) — a premium, SLA-backed low-latency option that delivers consistent responsiveness without throughput commitments, subject to scaling rules. It’s built for latency-sensitive scenarios like real-time decisioning, healthcare triage, financial transactions, and live customer experiences.

At launch, Priority Processing supports GPT-4.1 in both Global and Data Zone under Standard (pay-as-you-go), with GPT-5 and additional frontier models coming soon.

““We want great performance without long‑term commitments. Priority Processing gives us a premium, pay‑as‑you‑go lane that stays responsive at peak and spills over automatically when capacity is tight. It’s the practical way to deliver low‑latency AI at scale while keeping our roadmap flexible.”

- Sridhar Reddy, Vice President, Engineering, Harvey

Across all these options, Foundry gives enterprises performance, flexibility, and governance in one trusted platform — enabling them to run AI securely, efficiently, and entirely on their terms.

Fine-tuning and Customization — Models That Learn Your Business

Not every organization needs to build a model from scratch — but every organization needs models tailored to its own data, tone, and workflows. That’s why Microsoft Foundry supports fine-tuning and adapter-based customization across select hosted and partner models and integrates seamlessly with custom tools your teams already use or have developed (see our Tools Blog).

“Response times that were previously five or six seconds came down to one and a half to two seconds on average.” This approach made the system more efficient and the 50 percent reduction in latency made conversations with Discovery AI feel seamless. "

-- Stuart Emslie, Head of Actuarial and Data Science, Discovery Bank

We’re making every aspect of the fine-tuning process faster, smarter, and more accessible — from data prep to deployment.

New capabilities include

Synthetic data generation to automatically create training sets from your companies documents and code;
Developer Training tier to run your training jobs on spot capacity with a 50% discount
New OSS models including Ministral 3B, Qwen3 32B (coming soon), Llama 3.3 70B (coming soon), and gpt-oss 20B (coming soon) - all in the same intuitive UX as Foundry OpenAI models.
For your most advanced use cases, Agentic Reinforcement Fine-tuning (RFT) with GPT-5 is now available in private preview. Agentic RFT enables you to customize tool calling during GPT-5’s chain-of-thought reasoning, empowering you to train the model to act more like an autonomous agent, deciding which tools to use, in what order, and under what conditions so it can plan, reason, and execute multi-step workflows with purpose-driven intent.

Innovating with Azure AI Foundry

AI has entered its next chapter — one defined by performance, responsibility, and measurable outcomes. With innovations like Model Router GA, Priority Processing, and new models from Cohere, BFL, and NVIDIA, Microsoft Foundry empowers every organization to turn the world’s most advanced models into trusted, production-ready solutions. Whether optimizing latency, ensuring compliance, or scaling creativity, Foundry gives you the freedom to build with confidence — and the control to deploy with purpose.

Try all the new models @ https://ai.azure.com

Check out these sessions at Microsoft Ignite 2025 that dives deeper into all the latest announcements