Azure is pleased to share results from our MLPerf Inference v2.1 submission. For this submission, we benchmarked our NC A100 v4-series, NDm A100 v4-series, and NVads A10 v5-series. They are powered by the latest NVIDIA A100 PCIe Tensor Core GPUs, NVIDIA A100 SXM Tensor Core GPUs and NVIDIA A10 Tensor Core GPUs respectively. These offerings are our flagship virtual machine (VM) types for AI inference and training and enable our customers to address their inferencing needs, ranging from 1/6 of a GPU to eight GPUs. These series are all available making AI inference accessible to all. We are excited to see what new breakthroughs our customers will make using these VMs.
In this document, we share outstanding AI benchmark results MLPerf Inference v2.1 and the best practices and configuration details you need to be able to replicate them. And as a result, not only do we show that Azure is committed to providing our customers with the latest GPU offerings, but that are also in line with on-premises performance and available on-demand in the cloud, and scales to adapt to all sizes of AI workloads and needs.
MLPerfTM from MLCommons®
MLCommons® is an open engineering consortium of AI leaders from academia, research labs, and industry where the mission is to “build fair and useful benchmarks” that provide unbiased evaluations of training and inference performance for hardware, software, and services—all conducted under prescribed conditions. MLPerf™ Inference benchmarks consist of real-world compute-intensive AI workloads to best simulate customer’s needs. MLPerf™ tests are transparent and objective, so technology decision makers can rely on the results to make informed buying decisions.
Highlights of Performance Results
The highlights of results obtained with MLPerf Inference v2.1 benchmarks exercise are shown below.
NC A100 v4-series achieved 54.2K+ samples/s for RNN-T offline scenario
NDm A100 v4-series achieved 26+ samples/s for 3D U-Net offline scenario
NVads A10 v5-series achieved 24.7K+ queries/s for ResNet50 server scenario
Once your machine is deployed and configured, create a folder for the scripts and get the scripts from MLPerf Inference v2.1 repository.
cd /mnt/resource_nvme git clone https://github.com/mlcommons/inference_results_v2.1.git cd inference_results_v2.1/closed/Azure
Create folders for the data and get the ResNet50 data:
export MLPERF_SCRATCH_PATH=/mnt/resource_nvme/scratch mkdir -p $MLPERF_SCRATCH_PATH mkdir $MLPERF_SCRATCH_PATH/data $MLPERF_SCRATCH_PATH/models $MLPERF_SCRATCH_PATH/preprocessed_data cd $MLPERF_SCRATCH_PATH/data && mkdir imagenet && cd imagenet
In this imagenet folder download ImageNet Data available online and go back to the script.
Get the rest of the datasets from inside the container:
make prebuild make download_data BENCHMARKS="resnet50 bert rnnt 3d-unet" make download_model BENCHMARKS="resnet50 bert rnnt 3d-unet" make preprocess_data BENCHMARKS="resnet50 bert rnnt 3d-unet" make build
Run the benchmark
Finally, run the benchmark with the make run command, an example is given below. The value is only correct if the result is “VALID”, modify the value in the config files if the result is “INVALID”.
make run RUN_ARGS="--benchmarks=bert --scenarios=offline --config_ver=default,high_accuracy,triton,high_accuracy_triton"