Blog Post

Microsoft Foundry Blog
5 MIN READ

NVIDIA Nemotron 3 Super Now Available on Microsoft Foundry: Open, Efficient Reasoning for Agentic AI

vaidyas's avatar
vaidyas
Icon for Microsoft rankMicrosoft
Mar 16, 2026

Today, we’re announcing the availability of NVIDIA Nemotron 3 Super NIM in Microsoft Foundry, expanding the set of open, high‑performance reasoning models available to developers building agentic AI systems. Nemotron 3 Super brings a powerful new option to Microsoft Foundry for teams building the next generation of agentic AI. With long‑context reasoning, efficient inference, and an open model foundation, it gives developers greater flexibility as they design systems that move beyond chat into autonomous, multi‑step workflows. 

As teams move beyond simple chatbots toward long‑running, multi‑step agents, they need models that can reason deeply, handle massive context, and operate efficiently at scale. Nemotron 3 Super is purpose‑built for these agentic workloads, combining strong reasoning capabilities with architectural innovations designed to help reduce cost and latency. 

Model Overview  

NVIDIA Nemotron 3 Super is an open, high‑capacity reasoning model optimized for complex agentic AI workflows. It is designed to address two core challenges in multi‑agent systems: context explosion and the “thinking tax” that comes from continuous deep reasoning. The thinking tax is the extra cost and latency you pay when AI agents have to reason step‑by‑step and Nemotron 3 is designed to make that kind of reasoning much cheaper to run. 

With a native 1‑million‑token context window and a hybrid mixture‑of‑experts (MoE) architecture, Nemotron 3 Super enables agents to retain long‑term state, reason across large documents, and execute multi‑step tasks with higher efficiency. The model is fully open, giving teams flexibility to customize and adapt to their specific domains. 

  • Up to 4x faster token generation vs. Nemotron 2 
  • Predictable "Thinking Budget" for inference 
  • 1M-token context for complex workflows 
  • Top accuracy on agentic benchmarks 
  • Fully open for control and flexibility 

Key Capabilities 

  • Run agentic AI more efficiently at scale 
    Nemotron 3 Super is designed to address the high cost and latency of multi‑agent systems by delivering up to 5× higher throughput compared to the previous Nemotron Super model, making complex agentic workflows more practical to operate in production.  
  • Maintain coherence across long‑running workflows 
    With a native 1‑million‑token context window, customers can retain full workflow state, tool outputs, and intermediate reasoning over long tasks—helping prevent goal drift that commonly occurs in multi‑agent systems.  
  • Reduce the “thinking tax” in multi‑agent systems 
    Nemotron‑3 Super is built specifically to balance deep reasoning with efficiency, enabling agents to reason at each step without the prohibitive cost of using large models continuously across every subtask.  
  • Support advanced agentic use cases 
    The model is intended for complex, multi‑step agentic applications such as research, software development agents, and large‑scale enterprise automation, where both reasoning accuracy and efficiency are required. 

Use Cases 

Nemotron 3 Super is suited for agentic and reasoning‑heavy scenarios, including: 

  • Research and deep literature analysis agents 
  • Software development and code‑analysis agents 
  • Enterprise workflow automation and orchestration 
  • Long‑context document analysis and synthesis 

NVIDIA Nemotron Super 3 on Microsoft Foundry  

Through Microsoft Foundry, developers can access Nemotron 3 Super alongside a broad catalog of open models, using a unified platform for discovery, evaluation, and deployment allowing developers to operate with enterprise trust and scale. 

Microsoft Foundry serves as a unified system of record and enterprise control plane for AI, bringing together models, agents, evaluation, deployment, and governance into a single experience. With Microsoft Foundry, teams can move from experimentation to production with confidence, using the models and frameworks that best fit their requirements, while relying on a consistent operational foundation.  

Pricing 

The pricing breakdown consists of the Azure Compute charges plus a flat fee per GPU for the NVIDIA AI Enterprise license that is required to use the NIM software.   

  • Pay-as-you-go (per gpu hour)  
  • NIM Surcharge: $1 per gpu hour  
  • Azure Compute charges also apply based on deployment configuration 

Why use Managed Compute?  

Managed Compute is a deployment option within Microsoft Foundry Models that lets you run large language models (LLMs), SLMs, HuggingFace models and custom models fully hosted on Azure infrastructure. Azure Managed Compute is a powerful deployment option for models not available via standard (pay-go) endpoints. It gives you:   

  • Custom model support: Deploy open-source or third-party models   
  • Infrastructure flexibility: Choose your own GPU SKUs (NVIDIA A10, A100, H100)   
  • Detailed control: Configure inference servers, protocols, and advanced settings   
  • Full integration: Works with Azure ML SDK, CLI, Prompt Flow, and REST APIs   
  • Enterprise-ready: Supports VNet, private endpoints, quotas, and scaling policies  

NVIDIA NIM Microservices on Azure  

These models are available as NVIDIA NIM™ microservices on Microsoft Foundry. NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing. NIM microservices are pre-built, containerized AI endpoints that simplify deployment and scale across environments. They allow developers to run models securely and efficiently in the cloud environment. 

How to Get Started in Microsoft Foundry  

  • Explore Microsoft Foundry: Begin by accessing the Microsoft Foundry portal and then following the steps below.  
  • Select on top left existing project that is (Hub) resource provider.  If you do not have a HUB Project, create new Hub Project using “+ Create New” link.  

             

 

 

Create New Hub Project in Microsoft Foundry 

  • Choose AI Hub Resource:  
    Select AI Hub Resource in Microsoft Foundry 
  • Deploy with NIM Microservices: Use NVIDIA’s optimized containers for secure, scalable deployment.  
  • Select Model Catalog from the left sidebar menu:  
  • In the "Collections" filter, select NVIDIA to see all the NIM microservices that are available on Microsoft Foundry.  

 

Select NVIDIA under "Collections" in Microsoft Foundry 

 

  • Click Deploy.  
  • Choose the deployment name and virtual machine (VM) type that you would like to use for your deployment. VM SKUs that are supported for the selected NIM and also specified within the model card will be preselected. Note that this step requires having sufficient quota available in your Azure subscription for the selected VM type. If needed, follow the instructions to request a service quota increase.  
  • Use this NVIDIA NeMo Agent Toolkit: designed to orchestrate, monitor, and optimize collaborative AI agents.   

  

Note about the License  

Users are responsible for compliance with the terms of  NVIDIA AI Product Agreement .  

Learn More  

 

Updated Mar 16, 2026
Version 3.0
No CommentsBe the first to comment