Accelerating the Intelligence Age with Azure AI Infrastructure and the GA of ND GB200 v6

MattVegas1

Microsoft

Mar 18, 2025

By Matt Vegas, Principal Product Manager, Microsoft Azure

Today we are thrilled to announce the General Availability of Azure's latest AI infrastructure Virtual Machines, the ND GB200 v6. Azure is proud to be one of the first cloud service provider to launch a 4,000 NVIDIA GB200 Grace Blackwell powered supercomputing cluster for training state of the art models and accelerating production AI inference deployments at scale. These cutting-edge VMs, accelerated by NVIDIA Blackwell platform, represent the next frontier in AI supercomputing.

AI reasoning models and Agentic AI — systems capable of critical thinking, problem solving, and task execution — are coevolving at an exponential rate and revolutionizing the ways organizations operate. From AI research labs on the frontier of AI algorithmic development to organizations that have adopted AI tools in their daily operations – we are leading an AI platform shift that requires infrastructure that delivers performance, cost-efficiency, reliability, and near instantaneous response time.

This latest generation of AI supercomputer marks a quantum leap in hardware and infrastructure that will deliver unprecedented improvements in application performance for AI training and inferencing. They leverage a new rack-scale architecture that enables up to 72 Blackwell GPUs to act like a single exa-scale computer and liquid cooling in the datacenter to optimize performance and rack density. Furthermore, NVIDIA GB200 Grace Blackwell Superchips that connect ARM based NVIDIA Grace CPU with two NVIDIA Blackwell GPUs, with new high bandwidth NVIDIA NVLink-C2C interconnect; means the system is designed to deliver the next generation of AI and frontier models.

"As we push the boundaries of AI, our partnership with Azure and the introduction of the NVIDIA Blackwell platform represent a significant leap forward," said Ian Buck, Vice President of Hyperscale and HPC at NVIDIA. "The NVIDIA GB200 NVL72, with its unparalleled performance and connectivity, tackles the most complex AI workloads, enabling businesses to innovate faster and more securely. By integrating this technology with Azure's secure infrastructure, we are unlocking the potential of reasoning AI."

Microsoft Azure has also successfully proven the performance across these new AI supercomputing clusters and is pleased to share that using the LLAMA 70B model on the NVIDIA GB200 NVL72 on Azure, we can generate over 860,000 tokens/sec of throughput, a 9x increase per rack compared to last gen ND H100 v5 VMs.

The next frontier of AI supercomputing - Azure launches ND GB200 v6 Virtual Machine Series:

Today marks the general availability of our Azure ND GB200 v6 Virtual Machine (VM), powered by NVIDIA GB200 NVL72, that connects 36 Grace CPUs and 72 Blackwell GPUs into a single 72-GPU NVLink domain. This rack-scale system acts like a single massive exa-scale GPU delivering application performance gains and step function architectural improvements that are required to train and host the next generation of reasoning models, AI agents, and LLMs.

NVIDIA Blackwell GPUs: The NVIDIA Blackwell architecture introduces the largest GPU ever built with 2.5x the transistors of Hopper GPUs, a new 2^nd generation transformer engine, and new FP4 datatype. It includes new HBM3e GPU memory with a 36% increase in High Bandwidth Memory (HBM) with 192GB and a 67% increase in HBM capacity with 8 TB/s per GPU compared to previous generation of Azure ND H200 v5 VMs.
NVIDIA Grace CPU Superchip: High performance and power efficient Arm® Neoverse™ V2 cores connected with two Blackwell GPUs via the NVIDIA NVLink-C2C interface enabling high bandwidth access to CPU memory for checkpointing and offloading.
400 Gb/s dedicated bandwidth to each GPU with NVIDIA Quantum-2 InfiniBand networking with 1.6Tb/s per VM, and 28.8Tb/s per GB200 NVL72 hyper-computer in a non-blocking fat-tree network scaling to hundreds of thousands of GPUs.
NVIDIA GB200 NVL72: Fifth generation NVLink with 1.8 TB/s per GPU of bidirectional bandwidth (2x Hopper), connecting 72 Blackwell GPUs per rack, enabling the system to operate as a single 72-GPU NVLink domain. This 72 GPU rack scale system comprised of groups of 18 compute nodes with NVIDIA GB200 Grace Blackwell Superchips delivers up to 1.4 Exa-FLOPS of FP4 Tensor Core throughput, 13.5 TB of shared high bandwidth memory, 130TB/s of cross sectional NVLink bandwidth, and 28.8Tb/s scale-out networking.

To support the massive data demands of AI training and inferencing, the ND GB200 v6 is complemented by Azure Blob Storage, which provides exabytes of capacity and terabits of throughput. This ensures seamless access to the vast datasets needed to feed billion- or trillion-parameter models, empowering organizations to push the boundaries of AI without data bottlenecks.

Customer momentum continues:

Today we are working with some of the leading AI innovators to support their AI infrastructure needs, including Black Forest Labs.

"We are expanding our partnership with Microsoft Azure to combine BFL's unique research expertise in generative AI with Azure's powerful infrastructure. This collaboration enables us to build and deliver the best possible image and video models faster and at greater scale, providing our customers with state-of-the-art visual AI capabilities for media production, advertising, product design, content creation and beyond." - Robin Rombach, CEO

End-to-end AI on Azure:

Our software suite of Azure services and tools ensures that our customers can harness the true potential of Azure GB200 VMs to build, deploy and run AI workloads at scale to accelerate innovation while maintaining cost and resource efficiency.

With Azure CycleCloud and Azure Batch, organizations can quickly set up and manage their AI training environments, focusing on innovation and achieving their business goals. Azure CycleCloud simplifies the management of HPC/AI environments, allowing users to set up clusters and adjust resources dynamically to meet workload demands. It integrates leading job schedulers and filesystems seamlessly with Azure's infrastructure, supporting a range of AI training workloads. Our groundbreaking Azure CycleCloud Workspace for Slurm allows users to effortlessly create, configure, and deploy pre-defined Slurm clusters with CycleCloud on Azure, all without requiring any prior knowledge of Azure or Slurm.

Azure Batch enables large-scale parallel computing, handling thousands to millions of tasks per job and supporting containerized workloads. These platforms provide the tools and capabilities to efficiently scale AI training processes, letting organizations concentrate on core research and business objectives.

With Kubernetes’ increasing popularity as a framework for running cloud native AI workloads, Azure Kubernetes Service offers an excellent choice for rapid deployment of Kubernetes clusters and management of containerized AI workloads at scale. We are building platform-specific optimizations including enhanced observability that will ensure that our customers have the reliability and performance necessary for running large scale AI workloads.

Whether you are deploying Kubernetes or your own custom stack, Azure offers an integrated set of services and tools necessary for you to manage AI workloads at scale with high efficiency and reliability, maximizing the true potential of the GB200 infrastructure.

Continued Partnership of Innovation with NVIDA:

Leveraging our momentum on NVIDIA GB200 along with decades of building world class supercomputers, we are committed to always bring the latest NVIDIA platforms, including the newly announced NVIDIA GB300 NVL72, based on the NVIDIA Blackwell Ultra architecture, to the cloud, so stay tuned for more announcements like this at GTC from NVIDIA.

Learn more:

For more information and to request access to Microsoft's latest Virtual Machines for AI workloads, please register here.

Learn more from our post on unpacking the performance of Microsoft Azure ND GB200 v6 Virtual Machines and how you can get started with our benchmarking guide.

Updated Mar 19, 2025

Version 3.0