Microsoft Azure is committed to giving our customers industry-leading performance for all their workloads. With this mission, we always strive to bring the latest innovation in datacenter hardware and software. The vast majority of times, this means exciting announcements like new Azure HPC VMs - HBv2 and 'HPC on the cloud' records. However, this also means that we must update the software stack on our existing hardware to bring new features and improved performance and scalability to push the envelope even further.
The updates to the software stack on Azure’ infrastructure in this planned maintenance includes performance and reliability enhancements, and security updates. Specifically:
- The biggest upgrade is the expansion of the MPI stack (for H, NC, ND VM families) to enable support for all MPI implementations and versions, and RDMA verbs that can take advantage of the InfiniBand RDMA network for low latency and high bandwidth communication between VMs using SR-IOV.
- Major upgrade for the RDMA capability on InfiniBand due to:
- Improved resiliency for the InifiniBand RDMA network.
- Bare-metal like performance.
- Updates to the underlying virtualization layer for deterministic performance and scalability, particularly on many NUMA-node architecture platforms.
Why SR-IOV for InfiniBand?
SR-IOV is a specification that allows PCIe resources to be virtualized and shared. On Azure, it enables Accelerated Networking for the Ethernet network (on some VMs) and full MPI support on the HB and HC VMs. This work enables SR-IOV for the RDMA InfiniBand (IB) network which tightly coupled HPC and AI workloads require on the other IB-enabled VM families (H, NC, ND). MPI being one of the most popular and dominant standards for HPC, this work unlocks the full range of MPI implementations and versions to work natively on Azure. Also as a consequence, the usage of CUDA aware MPI implementations and distributed AI libraries for performant multi-node collectives (like NCCL2) are enabled.
MPI on Azure IB-enabled VMs
The HB and HC VMs will no longer be the only VMs with full MPI support. As is the case with the current incarnations of IB-enabled H, NC and ND VMs, Intel MPI version 5.x will continue to be supported. In addition, subsequent versions of Intel MPI as well as all other MPIs supported by the Open Fabric Enterprise Distribution (OFED), OpenMPI, and Nvidia’s NCCL2 library, providing optimized performance for GPUs, will be supported. These enhancements will provide customers with higher InfiniBand bandwidth, lower latencies, and most importantly, better distributed application performance.
InfiniBand use after update
If your workloads require the InfiniBand RDMA network or MPI, changes may be required to the way the workloads are setup to run. For managed services, see service-specific guidance (Azure Batch, Azure Machine Learning). For IaaS setups, we suggest the following:
- Simply use the optimized CentOS-HPC 7.6 VM image or update the VM OS to a version which includes inbox driver support for InfiniBand.
- Else, either manually install the OFED drivers or apply the InfiniBandDriver extensions (Linux and Windows).
- Test the new setup on the HB and HC VMs which are already SR-IOV enabled.
More details on configuring InfiniBand and setting up MPI on the VM are here.
As the update rolls out widely across the HPC and AI VM families in all regions, you will be notified of the impending update, its schedule, what to expect and any potential action to take. In the first phase, the NCv3-series is the first VM family to get this update in November 2019 with already released announcement and schedule. After the update, the setup for running distributed HPC and AI workloads across the IB-enabled HPC and AI VM families in all regions will be unified to one, optimized stack for improved performance, scalability, reliability, ease of migration and familiarity.