Today, we’re excited to announce that Kimi K2 Thinking, Moonshot AI’s most advanced open-source reasoning model, is now available in Microsoft Foundry as a direct from Azure model.
Kimi K2 Thinking represents a major leap forward in agentic intelligence. Designed as a true thinking agent, it performs multi-step reasoning, orchestrates long chains of tool calls, and maintains stable, goal-directed behavior across hundreds of steps. Now, this breakthrough capability is available with Azure-grade security, observability, and enterprise integration.
With this launch, Microsoft Foundry continues to expand its leadership as the cloud platform with the widest, most diverse models, spanning frontier models, open models, and now state-of-the-art thinking systems.
Kimi K2 Thinking: Open, Scalable, Multi-Step Reasoning
Kimi K2 Thinking is the latest and most capable evolution of Moonshot AI’s open reasoning models. Built as a long-horizon, tool-driven agent, it was trained end-to-end to interleave chain-of-thought reasoning with dynamic function calling.
What makes K2 Thinking different?
- Unprecedented Deep Thinking
Maintains coherent reasoning across 200–300 sequential tool calls, far beyond typical models that drift after 30–50 calls.
- 256K Context Window
Supports massive context for research, codebases, legal analysis, multi-document workflows, and stateful agents.
- Native INT4 Quantization (QAT)
Unlocks lossless 2× speed-ups and significantly lower memory usage critical for thinking models with very long decoding paths.
- End-to-End Trained Agentic Behavior
Designed for workflows like autonomous research, multi-stage coding tasks, data investigations, and iterative document drafting.
This combination of scale, depth, and efficiency makes K2 Thinking a powerful addition to the Foundry models, especially for customers building long-horizon, tool-rich agents.
Performance and Efficiency by Design
Kimi K2 Thinking leverages a Mixture-of-Experts architecture to get the best of both worlds: very large capacity when needed, but efficient runtime cost by activating only a subset of experts per token.
Key efficiency characteristics:
- 32B active parameters per inference with ~1T total parameters, delivering strong performance without the cost profile of a monolithic trillion-parameter model.
- Native INT4 quantization support, enabling significant speedups and better hardware utilization while maintaining quality.
- A training recipe tuned for stability at scale, with techniques like MuonClip and large-scale agentic post-training.
Pricing
|
Model |
Deployment Type |
Azure Resource Region |
Price /1M Tokens |
|
Kimi-K2-Thinking |
Global Standard |
All regions (Check this page for region details)
|
Input - $0.6 output - $2.5 |
A New Chapter for Open Reasoning Models in the Enterprise
Kimi K2 Thinking represents a new wave of open, high-performance reasoning models that don’t just answer but truly think through complex problems with tools and data. Bringing it into Microsoft Foundry is about more than adding one more model tile; it’s about giving customers another powerful way to build trustworthy, long-running, tool-driven AI systems on Azure.