Performance at Scale: The Role of Interconnects in Azure HPC & AI Infrastructure

HugoAffaticati

Microsoft

Jun 25, 2025

Microsoft Azure’s high-performance computing (HPC) & AI infrastructure is designed from the ground up to support the world’s most demanding workloads. High-performance AI workloads are bandwidth-hungry and latency-sensitive. As models scale in size and complexity, the efficiency of the interconnect fabric—how CPUs, GPUs, and storage communicate—becomes a critical factor in overall system performance. Even with the fastest GPUs, poor interconnect design can lead to bottlenecks, underutilized hardware, and extended time-to-results. In this blog post, we will highlight one of the key enabling features for running large-scale distributed workloads on Azure: a highly tuned HPC-class interconnect. Azure has invested years of system-level engineering of the InfiniBand interconnect, into ready-to-use configurations for customers available on Azure’s HB-series and N-series virtual machine (VMs).

by Hugo Affaticati (Cloud Infrastructure Engineer), Amirreza Rastegari (Senior Software Engineer), Jie Zhang (Principal Software Engineer), and Michael Ringenburg (Principal Software Engineer Manager...

Updated Jun 25, 2025

Version 4.0

Microsoft

Joined July 26, 2022

View Profile

Azure High Performance Computing (HPC) Blog

Follow this blog board to get notified when there's new activity

Blog Post

Performance at Scale: The Role of Interconnects in Azure HPC & AI Infrastructure