A quick start guide to benchmarking AI models in Azure: MLPerf Inference v2.1
Published Sep 08 2022 10:00 AM 1,056 Views

By Hugo Affaticati, Technical Program Manager




Azure is pleased to share results from our MLPerf Inference v2.1 submission. For this submission, we benchmarked our NC A100 v4-series, NDm A100 v4-series, and NVads A10 v5-series. They are powered by the latest NVIDIA A100 PCIe Tensor Core GPUs, NVIDIA A100 SXM Tensor Core GPUs and NVIDIA A10 Tensor Core GPUs respectively. These offerings are our flagship virtual machine (VM) types for AI inference and training and enable our customers to address their inferencing needs, ranging from 1/6 of a GPU to eight GPUs. These series are all available making AI inference accessible to all. We are excited to see what new breakthroughs our customers will make using these VMs.


In this document, we share outstanding AI benchmark results MLPerf Inference v2.1 and the best practices and configuration details you need to be able to replicate them. And as a result, not only do we show that Azure is committed to providing our customers with the latest GPU offerings, but that are also in line with on-premises performance and available on-demand in the cloud, and scales to adapt to all sizes of AI workloads and needs.



MLPerfTM from MLCommons®


MLCommons® is an open engineering consortium of AI leaders from academia, research labs, and industry where the mission is to “build fair and useful benchmarks” that provide unbiased evaluations of training and inference performance for hardware, software, and services—all conducted under prescribed conditions. MLPerf™ Inference benchmarks consist of real-world compute-intensive AI workloads to best simulate customer’s needs. MLPerf™ tests are transparent and objective, so technology decision makers can rely on the results to make informed buying decisions.



Highlights of Performance Results


The highlights of results obtained with MLPerf Inference v2.1 benchmarks exercise are shown below. 


  1.       NC A100 v4-series achieved 54.2K+ samples/s for RNN-T offline scenario
  2.       NDm A100 v4-series achieved 26+ samples/s for 3D U-Net offline scenario
  3.       NVads A10 v5-series achieved 24.7K+ queries/s for ResNet50 server scenario

Full results on MLCommons® website.



How to replicate the results in Azure



Deploy and set up a virtual machine on Azure by following Getting started with the NC A100 v4-series.


Set up the environment:

Once your machine is deployed and configured, create a folder for the scripts and get the scripts from MLPerf Inference v2.1 repository.

cd /mnt/resource_nvme
git clone https://github.com/mlcommons/inference_results_v2.1.git
cd inference_results_v2.1/closed/Azure

Create folders for the data and get the ResNet50 data:

export MLPERF_SCRATCH_PATH=/mnt/resource_nvme/scratch
cd $MLPERF_SCRATCH_PATH/data && mkdir imagenet && cd imagenet

In this imagenet folder download ImageNet Data available online and go back to the script.

cd /mnt/resource_nvme/inference_results_v2.1/closed/Azure

Get the rest of the datasets from inside the container:

make prebuild
make download_data BENCHMARKS="resnet50 bert rnnt 3d-unet"
make download_model BENCHMARKS="resnet50 bert rnnt 3d-unet"
make preprocess_data BENCHMARKS="resnet50 bert rnnt 3d-unet"
make build

Run the benchmark

Finally, run the benchmark with the make run command, an example is given below. The value is only correct if the result is “VALID”, modify the value in the config files if the result is “INVALID”.

make run RUN_ARGS="--benchmarks=bert --scenarios=offline --config_ver=default,high_accuracy,triton,high_accuracy_triton"


Version history
Last update:
‎Sep 08 2022 08:58 AM
Updated by: