As AI adoption matures, the conversation is shifting from model capability to system design, how to orchestrate models that deliver the right balance of quality, speed, and cost.
Today, we’re expanding the Microsoft Foundry model catalog with DeepSeek V4 Flash, and DeepSeek V4 Pro coming soon. It enables teams to build more adaptable and production-ready AI systems.
Why this matters?
Teams building AI applications today face a common challenge:
no single model is optimal for every task.
- Complex workflows require deep reasoning and long-context understanding
- High-volume applications demand low latency and cost efficiency
- Production systems need the flexibility to switch models as requirements evolve
With DeepSeek V4 Pro and Flash in Microsoft Foundry, you can match the right model to the right task without changing your infrastructure.
Meet DeepSeek V4 Pro and Flash
DeepSeek V4 Pro – for advanced reasoning and complex workflows
DeepSeek V4 Pro is designed for high-precision tasks that require strong reasoning and deeper context understanding.
Best suited for:
- Multi-step reasoning and analysis
- Complex coding and debugging workflows
- Long-document understanding and synthesis
- Agentic workflows requiring planning and decision-making
Key benefits:
- Strong reasoning performance for complex tasks
- High-quality outputs for enterprise-grade use cases
- Ideal for “high-stakes” workloads where accuracy matters most
DeepSeek V4 Flash – for speed and scale
DeepSeek V4 Flash is optimized for low latency and high-throughput scenarios, making it ideal for real-time and cost-sensitive applications.
Best suited for:
- Chat and conversational experiences
- High-volume content generation
- Classification, summarization, and extraction
- Real-time copilots and assistants
Key benefits:
- Fast response times
- Cost-efficient at scale
- Optimized for production workloads with high concurrency
One platform, multiple models built for production
Both DeepSeek V4 Pro and Flash are available through Microsoft Foundry’s unified platform, so you can:
- Access both models through a single API and endpoint
- Easily switch between models based on workload needs
- Route intelligently across models for cost and performance optimization
- Evaluate and compare models using your own data
- Deploy with enterprise-grade governance and security built in
This means you can move from experimentation to production without re-architecting your system.
Build smarter systems with model choice
In practice, most production systems benefit from using multiple models together:
- Use DeepSeek V4 Flash for high-volume, real-time interactions
- Route complex queries to DeepSeek V4 Pro for deeper reasoning
- Combine both in agentic workflows to balance cost and quality dynamically
With Microsoft Foundry, this orchestration becomes seamless—enabling teams to build more efficient and resilient AI systems.
Enterprise-ready by design
DeepSeek models in Microsoft Foundry inherit the platform’s enterprise-grade capabilities:
- Security and compliance controls to meet organizational requirements
- Built-in content safety and governance
- Observability and monitoring for usage, latency, and cost
- Flexible deployment options aligned to your infrastructure needs
This ensures you can adopt new models without compromising on trust or control.
Pricing
| Model | Input (1M tokens) | Output (1M tokens) |
| DeepSeek V4 Flash | $1.03 | $4.12 |
| DeepSeek V4 Pro | TBD | TBD |
Getting started
DeepSeek V4 Flash is now available in Microsoft Foundry. DeepSeek V4 Pro will be available soon.
You can:
- Explore both models in the Foundry model catalog
- Evaluate them using your own datasets
- Start building and deploying applications in minutes
AI is no longer about choosing a single “best” model.
It’s about building systems that intelligently balance quality, speed, and cost.
With DeepSeek V4 Pro and Flash in Microsoft Foundry, you get the flexibility to do exactly that—on a platform designed for enterprise-scale AI.