Microsoft Foundry Blog

9 MIN READ

NVIDIA NIM for NVIDIA Nemotron, Cosmos, & Microsoft Trellis: Now Available in Azure AI Foundry

vaidyas

Microsoft

Oct 28, 2025

We’re excited to announce 7 new powerful NVIDIA NIM™ additions to Azure AI Foundry Models now on Managed Compute.

The latest wave of models—NVIDIA Nemotron Nano 9B v2, Llama 3.1 Nemotron Nano VL 8B, Llama 3.3 Nemotron Super 49B v1.5 (coming soon), Cosmos Reason1-7B, Cosmos Predict 2.5 (coming soon), Cosmos Transfer 2.5. (coming soon), and Microsoft Trellis—marks a significant leap forward in intelligent application development. Collectively, these models redefine what’s possible in advanced instruction-following, vision-language understanding, and efficient language modeling, empowering developers to build multimodal, visually rich, and context-aware solutions.

By combining robust reasoning, flexible input handling, and enterprise-grade deployment options, these additions accelerate innovation across industries—from robotics and autonomous vehicles to immersive retail and digital twins—enabling smarter, safer, and more adaptive experiences at scale.

Meet the Models

Model Name	Size	Primary Use Cases
NVIDIA Nemotron Nano 9B v2 Available Now	9B parameters	Multilingual Reasoning: Multilingual and code-based reasoning tasks Enterprise Agents: AI and productivity agents Math/Science: Scientific reasoning, advanced math Coding: Software engineering and tool calling
Llama 3.3 Nemotron Super 49B v1.5 Coming Soon	49B	Enterprise Agents: AI and productivity agents Math/Science: Scientific reasoning, advanced math Coding: Software engineering and tool calling
Llama 3.1 Nemotron Nano VL 8B Available Now	8B	Multimodal: Multimodal vision-language tasks, document intelligence and understanding Edge Agents: Mobile and edge AI agents
Cosmos Reason1-7B Available Now	7B	Robotics: Planning and executing tasks with physical constraints. Autonomous Vehicles: Understanding environments and making decisions. Video Analytics Agents: Extracting insights and performing root-cause analysis from video data.
Cosmos Predict 2.5 Coming Soon	2B	Generalist Model: World state generation and prediction
Cosmos Transfer 2.5 Coming Soon	2B	Structural Conditioning: Physical AI
Microsoft TRELLIS by Microsoft Research Available Now	-	Digital Twins: Generate accurate 3D assets from simple prompts Immersive Retail experiences: photorealistic product models for AR, virtual try-ons Game and simulation development: Turn creative ideas into production-ready 3D content

Meet the NVIDIA Nemotron Family

NVIDIA Nemotron Nano 9B v2: Compact power for high-performance reasoning and agentic tasks

NVIDIA Nemotron Nano 9B v2 is a high-efficiency large language model built with a hybrid Mamba-Transformer architecture, designed to excel in both reasoning and non-reasoning tasks.

Efficient architecture for high-performance reasoning: Combines Mamba-2 and Transformer components to deliver strong reasoning capabilities with higher throughput.

Extensive multilingual and code capabilities: Trained on diverse language and programming data, it performs exceptionally well across tasks involving natural language (English, German, French, Italian, Spanish and Japanese), code generation, and complex problem solving.

Reasoning Budget Control: Supports runtime “thinking” budget control. During inference, the user can specify how many tokens the model is allowed to "think" for helping balance speed, cost, and accuracy during inference. For example, a user can tell the model to think for “1K tokens or 3K tokens, etc ” for different use cases with far better cost predictability.

Fig 1. provided by NVIDIA

Nemotron Nano 9B v2 is built from the ground up with training data spanning 15 languages and 43 programming languages, giving it broad multilingual and coding fluency. Its capabilities were sharpened through advanced post-training techniques like GRPO and DPO enabling it to reason deeply, follow instructions precisely, and adapt dynamically to different tasks.

-> Explore the model card on Azure AI Foundry

Llama 3.3 Nemotron Super 49B v1.5: High-throughput reasoning at scale

Llama 3.3 Nemotron Super 49Bv1.5 (coming soon) is a significantly upgraded version of Llama-3.3-Nemotron-Super-49B-v1 and is a large language model which is a derivative of Meta Llama-3.3-70B-Instruct (the reference model) optimized for advanced reasoning, instruction following, and tool use across a wide range of tasks.

Excels in applications such as chatbots, AI agents, and retrieval-augmented generation (RAG) systems

Balances accuracy and compute efficiency for enterprise-scale workloads

Designed to run efficiently on a single NVIDIA H100 GPU, making it practical for real-world applications

Llama-3.3-Nemotron-Super-49B-v1.5 was trained through a multi-phase process combining human expertise, synthetic data, and advanced reinforcement learning techniques to refine its reasoning and instruction-following abilities. Its impressive performance across benchmarks like MATH500 (97.4%) and AIME 2024 (87.5%) highlights its strength in tackling complex tasks with precision and depth.

Llama 3.1 Nemotron Nano VL 8B: Multimodal intelligence for edge deployments

Llama 3.1 Nemotron Nano VL 8B is a compact vision-language model that excels in tasks such as report generation, Q&A, visual understand, and document intelligence. This model delivers low latency and high efficiency, reducing TCO.

This model was trained on a diverse mix of human-annotated and synthetic data, enabling robust performance across multimodal tasks such as document understanding and visual question answering. It achieved strong results on evaluation benchmarks including DocVQA (91.2%), ChartQA (86.3%), AI2D (84.8%), and OCRBenchV2 English (60.1%).

-> Explore the model card on Azure AI Foundry

What Sets Nemotron Apart

NVIDIA Nemotron is a family of open models, datasets, recipes, and tools.

1. Open-source AI technologies: Open models, data, and recipes offer transparency, allowing developers to create trustworthy custom AI for their specific needs, from creating new agents to refining existing applications.

Open Weights: NVIDIA Open Model License offers enterprises data control and flexible deployment.

Open Data: Models are trained with transparent, permissively-licensed NVIDIA data, available on Hugging Face, ensuring confidence in use. Additionally, it allows developers to train their high-accuracy custom models with these open datasets.

Open Recipe: NVIDIA shares development techniques, like NAS, hybrid architecture, Minitron, as well as NeMo tools enabling customization or creation of custom models.

2. Highest Accuracy & Efficiency: Engineered for efficiency, Nemotron delivers industry leading accuracy in the least amount of time for reasoning, vision, and agentic tasks.

3. Run Anywhere On Cloud: Packaged as NVIDIA NIM, for secure and reliable deployment of high-performance AI model inferencing across Azure platforms.

Meet the Cosmos Family

NVIDIA Cosmos™ is a world foundation model (WFM) development platform to advance physical AI. At its core are Cosmos WFMs, openly available pretrained multimodal models that developers can use out-of-the-box for generating world states as videos and physical AI reasoning, or post-train to develop specialized physical AI models.

Cosmos Reason1-7B: Physical AI

Cosmos Reason1-7B combines chain-of-thought reasoning, flexible input handling for images and video, a compact 7B parameter architecture, and advanced physical world understanding making it ideal for real-time robotics, video analytics, and AI agents that require contextual, step-by-step decision-making in complex environments.

This model transforms how AI and robotics interact with the real world giving your systems the power to not just see and describe, but truly understand, reason, and make decisions in complex environments like factories, cities, and autonomous vehicles. With its ability to analyze video, plan robot actions, and verify safety protocols, Cosmos Reason1-7B helps developers build smarter, safer, and more adaptive solutions for real-world challenges.

Cosmos Reason1-7B is physical AI for 4 embodiments:

Fig.2 Physical AI

Model Strengths

Physical World Reasoning: Leverages prior knowledge, physics laws, and common sense to understand complex scenarios.
Chain-of-Thought (CoT) Reasoning: Delivers contextual, step-by-step analysis for robust decision-making.
Flexible Input: Handles images, video (up to 30 seconds, 1080p), and text with a 16k context window.
Compact & Deployable: 7B parameters runs efficiently from edge devices to the cloud.
Production-Ready: Available via Hugging Face, GitHub, and NVIDIA NIM; integrates with industry-standard APIs.

Enterprise Use Cases

Cosmos Reason1-7B is more than a model, it’s a catalyst for building intelligent, adaptive solutions that help enterprises shape a safer, more efficient, and truly connected physical world.

Fig.3 Use Cases

Reimagine safety and efficiency by empowering AI agents to analyze millions of live streams and recorded videos, instantly verifying protocols and detecting risks in factories, cities, and industrial sites.
Accelerate robotics innovation with advanced reasoning and planning, enabling robots to understand their environment, make methodical decisions, and perform complex tasks—from autonomous vehicles navigating busy streets to household robots assisting with daily chores.
Transform data curation and annotation by automating the selection, labeling, and critiquing of massive, diverse datasets, fueling the next generation of AI with high-quality training data.
Unlock smarter video analytics with chain-of-thought reasoning, allowing systems to summarize events, verify actions, and deliver actionable insights for security, compliance, and operational excellence.

-> Explore the model card on Azure AI Foundry

Also coming soon to Azure AI Foundry are two models of the Cosmos WFM, designed for world generation and data augmentation.

Cosmos Predict 2.5 2B

Cosmos Predict 2.5 is a next-generation world foundation model that generates realistic, controllable video worlds from text, images, or videos—all through a unified architecture.

Trained on 200M+ high-quality clips and enhanced with reinforcement learning, it delivers stronger physics and prompt alignment while cutting compute cost and post-training time for faster Physical AI workflows.

Cosmos Transfer 2.5 2B

While Predict 2.5 generates worlds, Transfer 2.5 that transforms structured simulation inputs—like segmentation, depth, or LiDAR maps—into photorealistic synthetic data for Physical AI training and development.

What Sets Cosmos Apart

Built for Physical AI — Purpose-built for robotics, autonomous systems, and embodied agents that understand physics, motion, and spatial environments.
Multimodal World Modeling — Combines images, video, depth, segmentation, LiDAR, and trajectories to create physics-aware, controllable world simulations.
Scalable Synthetic Data Generation — Generates diverse, photorealistic data at scale using structured simulation inputs for faster Sim2Real training and adaptation.

Microsoft Trellis by Microsoft Research: Enterprise-ready 3D Generation

Microsoft Trellis by Microsoft Research is a cutting-edge 3D asset generation model developed by Microsoft Research, designed to create high-quality, versatile 3D assets, complete with shapes and textures, from text or image prompts. Seamlessly integrated within the NVIDIA NIM microservice, Trellis accelerates asset generation and empowers creators with flexible, production-ready outputs.

Quickly generate high-fidelity 3D models from simple text or image prompts perfect for industries like manufacturing, energy, and smart infrastructure looking to accelerate digital twin creation, predictive maintenance, and immersive training environments.

From virtual try-ons in retail to production-ready assets in media, TRELLIS empowers teams to create stunning 3D content at scale, cutting down production time and unlocking new levels of interactivity and personalization.

-> Explore the model card on Azure AI Foundry

Pricing

The pricing breakdown consists of the Azure Compute charges plus a flat fee per GPU for the NVIDIA AI Enterprise license that is required to use the NIM software.

Pay-as-you-go (per gpu hour)
NIM Surcharge: $1 per gpu hour
Azure Compute charges also apply based on deployment configuration

Why use Managed Compute?

Managed Compute is a deployment option within Azure AI Foundry Models that lets you run large language models (LLMs), SLMs, HuggingFace models and custom models fully hosted on Azure infrastructure. Azure Managed Compute is a powerful deployment option for models not available via standard (pay-go) endpoints. It gives you:

Custom model support: Deploy open-source or third-party models
Infrastructure flexibility: Choose your own GPU SKUs (NVIDIA A10, A100, H100)
Detailed control: Configure inference servers, protocols, and advanced settings
Full integration: Works with Azure ML SDK, CLI, Prompt Flow, and REST APIs
Enterprise-ready: Supports VNet, private endpoints, quotas, and scaling policies

NVIDIA NIM Microservices on Azure

These models are available as NVIDIA NIM™ microservices on Azure AI Foundry. NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing. NIM microservices are pre-built, containerized AI endpoints that simplify deployment and scale across environments. They allow developers to run models securely and efficiently in the cloud environment.

If you're ready to build smarter, more capable AI agents, start exploring Azure AI Foundry.

Build Trustworthy AI Solutions

Azure AI Foundry delivers managed compute designed for enterprise-grade security, privacy, and governance. Every deployment of NIM microservices through Azure AI Foundry is backed by Microsoft’s Responsible AI principles and Secure Future Initiative ensuring fairness, reliability, and transparency so organizations can confidently build and scale agentic AI workflows.

How to Get Started in Azure AI Foundry

Explore Azure AI Foundry: Begin by accessing the Azure AI Foundry portal and then following the steps below.
Navigate to ai.azure.com.
Select on top left existing project that is (Hub) resource provider. If you do not have a HUB Project, create new Hub Project using “+ Create New” link.

Create New Hub Project in Azure AI Foundry

Choose AI Hub Resource:
Select AI Hub Resource in Azure AI Foundry
Deploy with NIM Microservices: Use NVIDIA’s optimized containers for secure, scalable deployment.
Select Model Catalog from the left sidebar menu:
In the "Collections" filter, select NVIDIA to see all the NIM microservices that are available on Azure AI Foundry.

Select NVIDIA under "Collections" in Azure AI Foundry

Select the NIM you want to use.
Click Deploy.
Choose the deployment name and virtual machine (VM) type that you would like to use for your deployment. VM SKUs that are supported for the selected NIM and also specified within the model card will be preselected. Note that this step requires having sufficient quota available in your Azure subscription for the selected VM type. If needed, follow the instructions to request a service quota increase.
Use this NVIDIA NeMo Agent Toolkit: designed to orchestrate, monitor, and optimize collaborative AI agents.