Nov 24 2021 07:15 AM
Nov 24 2021 07:15 AM
Written by Sherry Wang, Senior Program Manager, Azure HPC and AI
Today, Microsoft announced the general availability of a brand-new virtual machine (VM) series in Azure, the NDm A100 v4 Series, featuring NVIDIA A100 Tensor Core 80 GB GPUs. This expands Azure leadership-class AI supercomputing scalability in the public cloud, building on our June general availability of the original ND A100 v4 instances, and adding another public cloud first with the Azure ND A100 v4 VMs claiming four official places in the TOP500 supercomputing list. This milestone is thanks to a class-leading design with NVIDIA Quantum InfiniBand networking, featuring In-Network Computing, 200 GB/s and GPUDirect RDMA for each GPU, and an all-new PCIe Gen 4.0-based architecture.
We live in the era of large-scale AI models, the demand for large scale computing keeps growing. The original ND A100 v4 series features NVIDIA A100 Tensor Core GPUs each equipped with 40 GB of HBM2 memory, which the new NDm A100 v4 series doubles to 80 GB, along with a 30 percent increase in GPU memory bandwidth for today’s most data-intensive workloads. RAM available to the virtual machine has also increased to 1,900 GB per VM- to allow customers with large datasets and models a proportional increase in memory capacity to support novel data management techniques, faster checkpointing, and more.
The high-memory NDm A100 v4 series brings AI-Supercomputer power to the masses by creating opportunities for all businesses to use it as a competitive advantage. Cutting-edge AI customers are using both 40 GB ND A100 v4 VMs and 80 GB NDm A100 v4 VMs at scale for large-scale production AI and machine learning workloads, and seeing impressive performance and scalability, including OpenAI for research and products, Meta for their leading AI research, Nuance for their comprehensive AI-powered voice-enabled solution, numerous Microsoft internal teams for large scale cognitive science model training, and many more.