Today, we’re announcing the availability of NVIDIA Nemotron 3 Super NIM in Microsoft Foundry, expanding the set of open, high‑performance reasoning models available to developers building agentic AI systems. Nemotron 3 Super brings a powerful new option to Microsoft Foundry for teams building the next generation of agentic AI. With long‑context reasoning, efficient inference, and an open model foundation, it gives developers greater flexibility as they design systems that move beyond chat into autonomous, multi‑step workflows.
As teams move beyond simple chatbots toward long‑running, multi‑step agents, they need models that can reason deeply, handle massive context, and operate efficiently at scale. Nemotron 3 Super is purpose‑built for these agentic workloads, combining strong reasoning capabilities with architectural innovations designed to help reduce cost and latency.
Model Overview
NVIDIA Nemotron 3 Super is an open, high‑capacity reasoning model optimized for complex agentic AI workflows. It is designed to address two core challenges in multi‑agent systems: context explosion and the “thinking tax” that comes from continuous deep reasoning. The thinking tax is the extra cost and latency you pay when AI agents have to reason step‑by‑step and Nemotron 3 is designed to make that kind of reasoning much cheaper to run.
With a native 1‑million‑token context window and a hybrid mixture‑of‑experts (MoE) architecture, Nemotron 3 Super enables agents to retain long‑term state, reason across large documents, and execute multi‑step tasks with higher efficiency. The model is fully open, giving teams flexibility to customize and adapt to their specific domains.
- Up to 4x faster token generation vs. Nemotron 2
- Predictable "Thinking Budget" for inference
- 1M-token context for complex workflows
- Top accuracy on agentic benchmarks
- Fully open for control and flexibility
Key Capabilities
- Run agentic AI more efficiently at scale
Nemotron 3 Super is designed to address the high cost and latency of multi‑agent systems by delivering up to 5× higher throughput compared to the previous Nemotron Super model, making complex agentic workflows more practical to operate in production.
- Maintain coherence across long‑running workflows
With a native 1‑million‑token context window, customers can retain full workflow state, tool outputs, and intermediate reasoning over long tasks—helping prevent goal drift that commonly occurs in multi‑agent systems.
- Reduce the “thinking tax” in multi‑agent systems
Nemotron‑3 Super is built specifically to balance deep reasoning with efficiency, enabling agents to reason at each step without the prohibitive cost of using large models continuously across every subtask.
- Support advanced agentic use cases
The model is intended for complex, multi‑step agentic applications such as research, software development agents, and large‑scale enterprise automation, where both reasoning accuracy and efficiency are required.
Use Cases
Nemotron 3 Super is suited for agentic and reasoning‑heavy scenarios, including:
- Research and deep literature analysis agents
- Software development and code‑analysis agents
- Enterprise workflow automation and orchestration
- Long‑context document analysis and synthesis
NVIDIA Nemotron Super 3 on Microsoft Foundry
Through Microsoft Foundry, developers can access Nemotron 3 Super alongside a broad catalog of open models, using a unified platform for discovery, evaluation, and deployment allowing developers to operate with enterprise trust and scale.
Microsoft Foundry serves as a unified system of record and enterprise control plane for AI, bringing together models, agents, evaluation, deployment, and governance into a single experience. With Microsoft Foundry, teams can move from experimentation to production with confidence, using the models and frameworks that best fit their requirements, while relying on a consistent operational foundation.
Pricing
The pricing breakdown consists of the Azure Compute charges plus a flat fee per GPU for the NVIDIA AI Enterprise license that is required to use the NIM software.
- Pay-as-you-go (per gpu hour)
- NIM Surcharge: $1 per gpu hour
- Azure Compute charges also apply based on deployment configuration
Why use Managed Compute?
Managed Compute is a deployment option within Microsoft Foundry Models that lets you run large language models (LLMs), SLMs, HuggingFace models and custom models fully hosted on Azure infrastructure. Azure Managed Compute is a powerful deployment option for models not available via standard (pay-go) endpoints. It gives you:
- Custom model support: Deploy open-source or third-party models
- Infrastructure flexibility: Choose your own GPU SKUs (NVIDIA A10, A100, H100)
- Detailed control: Configure inference servers, protocols, and advanced settings
- Full integration: Works with Azure ML SDK, CLI, Prompt Flow, and REST APIs
- Enterprise-ready: Supports VNet, private endpoints, quotas, and scaling policies
NVIDIA NIM Microservices on Azure
These models are available as NVIDIA NIM™ microservices on Microsoft Foundry. NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing. NIM microservices are pre-built, containerized AI endpoints that simplify deployment and scale across environments. They allow developers to run models securely and efficiently in the cloud environment.
How to Get Started in Microsoft Foundry
- Explore Microsoft Foundry: Begin by accessing the Microsoft Foundry portal and then following the steps below.
- Navigate to ai.azure.com.
- Select on top left existing project that is (Hub) resource provider. If you do not have a HUB Project, create new Hub Project using “+ Create New” link.
Create New Hub Project in Microsoft Foundry
- Choose AI Hub Resource:
Select AI Hub Resource in Microsoft Foundry
- Deploy with NIM Microservices: Use NVIDIA’s optimized containers for secure, scalable deployment.
- Select Model Catalog from the left sidebar menu:
- In the "Collections" filter, select NVIDIA to see all the NIM microservices that are available on Microsoft Foundry.
Select NVIDIA under "Collections" in Microsoft Foundry
- Select the NIM you want to use: Nvidia Nemotron Super 3
- Click Deploy.
- Choose the deployment name and virtual machine (VM) type that you would like to use for your deployment. VM SKUs that are supported for the selected NIM and also specified within the model card will be preselected. Note that this step requires having sufficient quota available in your Azure subscription for the selected VM type. If needed, follow the instructions to request a service quota increase.
- Use this NVIDIA NeMo Agent Toolkit: designed to orchestrate, monitor, and optimize collaborative AI agents.
Note about the License
Users are responsible for compliance with the terms of NVIDIA AI Product Agreement .
Learn More