This blog was authored by Aimee Garcia, Program Manager - AI Benchmarking. Additional contributions by Program Manager Daramfon Akpan, Program Manager Gaurav Uppal, Program Manager Hugo Affaticati.
Microsoft Azure’s publicly available AI inferencing capabilities are led by the NDm A100 v4, ND A100 v4 and NC A100 v4 virtual machines (VMs) powered by the latest NVIDIA A100 Tensor Core GPUs. These results showcase Azure’s commitment to making AI inferencing available to all researchers and users in the most accessible way while raising the bar in AI inferencing in Azure. To see the announcement on Azure.com please click here.
Highlights from the results
ND96amsr A100 v4 powered by NVIDIA A100 80G SXM Tensor Core GPU
Benchmark |
Samples/second |
Queries/second |
Scenarios |
bert-99 |
27.5K+ |
~22.5K |
Offline and server |
resnet |
300K+ |
~200K+ |
Offline and server |
3d-unet |
24.87 |
|
Offline |
NC96ads A100 v4 powered by NVIDIA A100 80G PCIe Tensor Core GPU
Benchmark |
Samples/second |
Queries/second |
Scenarios |
bert-99.9 |
~6.3K |
~5.3K |
Offline and server |
resnet |
144K |
~119.6K |
Offline and server |
3d-unet |
11.7 |
|
Offline |
The results were generated by deploying the environment using the VM offerings and Azure’s Ubuntu 18.04-HPC marketplace image.
Steps to reproduce the results in Azure
Set up and connect to a VM via SSH - decide which VM you want to benchmark
Set up the dependencies
cd /mnt
nvidia-smi
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo dpkg -i cuda-repo-ubuntu1804-11-6-local_11.6.1-510.47.03-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-6-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
sudo dpkg -P moby-cli
curl https://get.docker.com | sh && sudo systemctl --now enable docker
sudo chmod 777 /var/run/docker.sock
docker info (add check docker version by running this)
sudo reboot
You should have version 20.10.12 or newer
nvidia-smi
cd /mnt
sudo touch nvme.sh
sudo vi nvme.sh
#!/bin/bash
NVME_DISKS_NAME=`ls /dev/nvme*n1`
NVME_DISKS=`ls -latr /dev/nvme*n1 | wc -l`
echo "Number of NVMe Disks: $NVME_DISKS"
if [ "$NVME_DISKS" == "0" ]
then
exit 0
else
mkdir -p /mnt/resource_nvme
# Needed incase something did not unmount as expected. This will delete any data that may be left behind
mdadm --stop /dev/md*
mdadm --create /dev/md128 -f --run --level 0 --raid-devices $NVME_DISKS $NVME_DISKS_NAME
mkfs.xfs -f /dev/md128
mount /dev/md128 /mnt/resource_nvme
fi
chmod 1777 /mnt/resource_nvme
sudo sh nvme.sh
sudo vi /etc/docker/daemon.json
Add this line after the first curly bracket:
"data-root": "/mnt/resource_nvme/data",
sudo systemctl restart docker
cd resource_nvme
export MLPERF_SCRATCH_PATH=/mnt/resource_nvme/scratch
vi README.md
Below are graphs showing the achieved results for the NDm A100 v4, NC A100 v4 and ND A100 v4 VMs. The units are in throughput/second (samples and queries).
More about MLPerf
To learn more about MLCommons benchmarks, visit the MLCommons website.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.