Blog Post

Microsoft Foundry Blog
2 MIN READ

Kimi K2 Thinking Now in Microsoft Foundry

RashaudSavage's avatar
RashaudSavage
Icon for Microsoft rankMicrosoft
Dec 08, 2025

Today, we’re excited to announce that Kimi K2 Thinking, Moonshot AI’s most advanced open-source reasoning model, is now available in Microsoft Foundry as a direct from Azure model.  

Kimi K2 Thinking represents a major leap forward in agentic intelligence. Designed as a true thinking agent, it performs multi-step reasoning, orchestrates long chains of tool calls, and maintains stable, goal-directed behavior across hundreds of steps. Now, this breakthrough capability is available with Azure-grade security, observability, and enterprise integration. 

With this launch, Microsoft Foundry continues to expand its leadership as the cloud platform with the widest, most diverse models, spanning frontier models, open models, and now state-of-the-art thinking systems.

Kimi K2 Thinking: Open, Scalable, Multi-Step Reasoning 

Kimi K2 Thinking is the latest and most capable evolution of Moonshot AI’s open reasoning models. Built as a long-horizon, tool-driven agent, it was trained end-to-end to interleave chain-of-thought reasoning with dynamic function calling. 

What makes K2 Thinking different? 

  • Unprecedented Deep Thinking 
    Maintains coherent reasoning across 200–300 sequential tool calls, far beyond typical models that drift after 30–50 calls. 
  • 256K Context Window 
    Supports massive context for research, codebases, legal analysis, multi-document workflows, and stateful agents. 
  • Native INT4 Quantization (QAT) 
    Unlocks lossless 2× speed-ups and significantly lower memory usage critical for thinking models with very long decoding paths. 
  • End-to-End Trained Agentic Behavior 
    Designed for workflows like autonomous research, multi-stage coding tasks, data investigations, and iterative document drafting. 

This combination of scale, depth, and efficiency makes K2 Thinking a powerful addition to the Foundry models, especially for customers building long-horizon, tool-rich agents. 

Performance and Efficiency by Design 

Kimi K2 Thinking leverages a Mixture-of-Experts architecture to get the best of both worlds: very large capacity when needed, but efficient runtime cost by activating only a subset of experts per token.  

Key efficiency characteristics: 

  • 32B active parameters per inference with ~1T total parameters, delivering strong performance without the cost profile of a monolithic trillion-parameter model.  
  • Native INT4 quantization support, enabling significant speedups and better hardware utilization while maintaining quality.  
  • A training recipe tuned for stability at scale, with techniques like MuonClip and large-scale agentic post-training. 

Pricing 

Model 

Deployment Type 

Azure Resource Region 

Price /1M Tokens 

Kimi-K2-Thinking 

Global Standard 

All regions (Check this page for region details) 

 

Input - $0.6 

output - $2.5 

A New Chapter for Open Reasoning Models in the Enterprise 

Kimi K2 Thinking represents a new wave of open, high-performance reasoning models that don’t just answer but truly think through complex problems with tools and data. Bringing it into Microsoft Foundry is about more than adding one more model tile; it’s about giving customers another powerful way to build trustworthy, long-running, tool-driven AI systems on Azure. 

Updated Dec 08, 2025
Version 1.0
No CommentsBe the first to comment