Azure ND GB300 v6 now Generally Available - Hyper-optimized for Generative and Agentic AI workloads

Microsoft

Nov 19, 2025

We are pleased to announce the General Availability (GA) of ND GB300 v6 virtual machines, delivering the next leap in AI infrastructure. On 10/09, we shared the delivery of the first at-scale production cluster with more than 4,600 NVIDIA GB300 NVL72, featuring NVIDIA Blackwell Ultra GPUs connected through the next-generation NVIDIA InfiniBand network. We have now deployed tens of thousands of GB300 GPUs for production customer workloads and expect to scale to hundreds of thousands. Built on NVIDIA GB300 NVL72 systems, these VMs redefine performance for frontier model training, large-scale inference, multimodal reasoning, and agentic AI.

The ND GB300 v6 series enables customers to:

Deploy trillion-parameter models with unprecedented throughput.
Accelerate inference for long-context and multimodal workloads.
Scale seamlessly at high bandwidth for large scale training workloads.

In recent benchmarks, ND GB300 v6 achieved over 1.1 million tokens per second on Llama 2 70B inference workloads - a 27% uplift over ND GB200 v6. This performance breakthrough enables customers to serve long-context, multimodal, and agentic AI models with unmatched speed and efficiency.

With the general availability of ND GB300 v6 VMs, Microsoft strengthens its long-standing collaboration with NVIDIA by leading the market in delivering the latest GPU innovations, reaffirming our commitment to world-class AI infrastructure.

The ND v6 GB300 systems are built in a rack-scale design, with each rack hosting 18 VMs for a total of 72 GPUs interconnected by high-speed NVLINK. Each VM has 2 NVIDIA Grace CPUs and 4 Blackwell Ultra GPUs. Each NVLINK connect rack contains:

72 NVIDIA Blackwell Ultra GPUs (with 36 NVIDIA Grace CPUs).
800 gigabits per second (Gbp/s) per GPU cross-rack scale-out bandwidth via next-generation NVIDIA Quantum-X800 InfiniBand (2x ND GB200 v6).
130 terabytes (TB) per second of NVIDIA NVLink bandwidth within rack.
37TB of fast memory. (~20 TB HBM3e + ~17TB LPDDR)
Up to 1,440 petaflops (PFLOPS) of FP4 Tensor Core performance. (1.5x ND GB200 v6)

Together, NVLINK and XDR InfiniBand enable GB300 systems to behave as a unified compute and memory pool, minimizing latency, maximizing bandwidth, and dramatically improving scalability. Within a rack, NVLink enables coherent memory access and fast synchronization for tightly coupled workloads. Across racks, XDR InfiniBand ensures ultra-low latency, high-throughput communication with SHARP offloading—maintaining sub-100 µs latency for cross-node collectives.

Azure provides an end-to-end AI platform that enables customers to build, deploy, and scale AI workloads efficiently on GB300 infrastructure. Services like Azure CycleCloud and Azure Batch simplify the setup and management of HPC and AI environments, allowing organizations to dynamically adjust resources, integrate leading schedulers, and run containerized workloads at massive scale. With tools such as CycleCloud Workspace for Slurm, users can create and configure clusters without prior expertise, while Azure Batch handles millions of parallel tasks, ensuring cost and resource efficiency for large-scale training.

For cloud-native AI, Azure Kubernetes Service (AKS) offers rapid deployment and management of containerized workloads, complemented by platform-specific optimizations for observability and reliability. Whether using Kubernetes or custom stacks, Azure delivers a unified suite of services to maximize performance and scalability.

Learn More & Get Started

Updated Nov 14, 2025

Version 1.0

ai infrastructure

msignite

Nitin_Nagarkatte

Microsoft

Joined September 25, 2024

View Profile

Azure High Performance Computing (HPC) Blog

Follow this blog board to get notified when there's new activity