Blog Post

Azure Infrastructure Blog
4 MIN READ

Reimagining AI at scale: NVIDIA GB300 NVL72 on Azure

gwaqar's avatar
gwaqar
Icon for Microsoft rankMicrosoft
Oct 28, 2025

By Gohar Waqar, CVP of Cloud Hardware Infrastructure Engineering, Microsoft

 

Microsoft was the first hyperscaler to deploy the NVIDIA GB300 NVL72 infrastructure at scale – with a fully integrated platform engineered to deliver unprecedented compute density in a single rack to meet the demands of agentic AI workloads. Each GB300 NVL72 rack packs 72 NVIDIA Blackwell Ultra  GPUs and 36 NVIDIA Grace™ CPUs with up to ~136 kW of IT load, enabled by Microsoft’s custom liquid cooling heat exchanger unit (HXU) system. Using a systems approach to architect GB300 clusters, Azure’s new NDv6 GB300 VMs include robust infrastructure innovation across every layer of the stack, including smart rack management for fleet health, innovative cooling systems, and efficient deployment features that make scaling high-density AI clusters easier than ever.

With purpose-built hardware engineered for a unified platform – from silicon to systems to software – Azure’s deployment of NVIDIA GB300 NVL72 is a clear representation of Microsoft’s commitment to raising the bar on accelerated computing, enabling training of multitrillion-parameter models and high throughput on inference workloads.

Unique features of NVIDIA GB300 NVL72 system on Microsoft Azure

Ultra-dense AI rack - The GB300 rack integrates 72 NVIDIA Blackwell Ultra GPUs (each with 288 GB HBM3e each) and 36 Grace CPUs, effectively delivering supercomputer-class performance in a single rack.

Advanced liquid cooling - Each rack uses direct-to-chip liquid cooling. In air-cooled data centers, external liquid cooling heat exchanger unit (HXU) radiator units in each rack dissipate ~136 kW to room air. In facilities with chilled water, the rack connects directly to facility water.

Smart rack management - The system is equipped with an embedded controller that monitors power, temperature, coolant flow, and leak sensors in real time. It can auto-throttle or shut down components if conditions go out-of-range and provide full telemetry for remote fleet diagnostics.

Fully integrated security and offload features: Our unique design also includes the Azure Integrated Hardware Security Module (HSM) chip and Azure Boost offload accelerator for advanced I/O and security performance.

Scalable datacenter deployment - GB300 arrives as an integrated rack (compute trays, NVIDIA NVLink™ fabric, cooling, and power shelves pre-installed). Deployment is streamlined – just requiring connectivity power and cooling, performance of initial checks, and the rack self-regulates its cooling and power distribution.

Azure server featuring NVIDIA Blackwell Ultra GPUs and Grace CPUs

Purpose-built architecture designed for rapid deployment and scale

At its core, GB300 is built to maximize AI compute density within a standard data center footprint. It is a single-rack AI inference and training cluster with unprecedented component density. Compared to the previous generation (NVIDIA GB200 NVL72), it introduces higher-performance GPUs (from ~1.2 kW to ~1.4 kW each with more HBM3e memory), a ~50% boost in NVFP4 throughput and a revamped power/cooling design to handle ~20% greater thermal and power load. The liquid cooling system for the GPU module is enhanced with a new cold plate and improved leak detection assembly for safe, high-density operation.

Innovations in our purpose-built Azure Boost accelerator for I/O offload unlock higher bandwidth, while our custom Datacenter-secure Control Module (DC-SCM) introduces a secure, modular control plane built on a hardware root of trust, backed by the Azure Integrated Hardware Security Module (HSM). Together, these advancements enable fleet-wide manageability, strengthening security and operational resilience at scale meeting the demands of hyperscale environments.

Azure server featuring NVIDIA GB300 NVL72, enhanced by Azure Boost and Datacenter-secure Control Module

Cooling systems designed for deployability and global resiliency

To dissipate ~136 kW of heat per rack, GB300 relies on direct liquid cooling for all major components. To offer resiliency and wide deployability across Microsoft’s datacenter footprint, our cooling designs support both facility-water and air-cooled environments. Both approaches use a closed coolant loop inside the rack with a treated water-glycol fluid. Leak detection cables line each tray, and the base of the rack is equipped with smart management protocols to address potential leaks. Using this method, liquid cooling is highly efficient and reliable – it allows GB300 to run with warmer coolant temperatures than traditional datacenter water, improving overall power usage effectiveness (PUE).

Azure AI infrastructure featuring NVIDIA GB300 NVL72, with advanced cooling and rack management capabilities

Smart management, fleet health & diagnostics

Each GB300 rack is a “smart IT rack” with an embedded management controller that oversees its operation. This controller is supported by a rack control module that serves as the brain of the rack, providing comprehensive monitoring and automation for power, cooling, and health diagnostics. By delivering an integrated “single pane of glass” view for each rack’s health, the GB300 makes management at scale feasible despite the complexity.

This rack self-regulates its power and thermal environment once installed, adjusting fans or pump speeds automatically, and isolates faults – reducing the manual effort to keep the cluster running optimally so customers can focus on the workloads, with confidence that the infrastructure is continuously self-monitoring and safeguarding itself. In addition to this, the rack control module monitors and moderates GPU peak power consumption and other power management scenarios. These robust design choices reflect the fleet-first mindset – maximizing uptime and easier diagnostics in large deployments.

Person working on installation of an Azure server featuring NVIDIA GB300 NVL72

Efficient and streamlined deployment

As Microsoft scales thousands of GB300 racks for increased AI supercomputing capacity, fast and repeatable deployment is critical. 

GB300 introduces a new era of high-density AI infrastructure, tightly integrating cutting-edge hardware (Grace CPUs, Blackwell Ultra GPUs, and NVLink connectivity) with innovations both in power delivery and liquid cooling. Crucially, it does so with an eye toward operational excellence: built-in management, health diagnostics, and deployment-friendly design mean that scaling up AI clusters with GB300 can be done rapidly and reliably.

With its unprecedented compute density, intelligent self-management, and flexible cooling options, the GB300 platform enables organizations to scale rapidly with the latest AI supercomputer hardware while maintaining the reliability and serviceability expected in Azure’s promise to customers. GB300 unlocks next-level AI performance delivered in a package engineered for real-world efficiency and fleet-scale success.

Updated Oct 28, 2025
Version 1.0
No CommentsBe the first to comment