By: Mark Gitau, Software Engineer, and Hugo Affaticati, Technical Program Manager 2
Useful resources:
New NC H100 v5-series: Microsoft NC H100 v5-series
Thought leadership article: Aka.ms/Blog/MLPerfInfv4
Azure results for MLPerf Inference: MLPerf Inference V4.0
Submission to GitHub: mlcommons/inference_results_v4.0
Microsoft Azure has delivered industry-leading results for AI inference workloads amongst cloud service providers in the most recent MLPerf Inference results published publicly by MLCommons. The Azure results were achieved using the new NC H100 v5 Virtual Machines (VMs) and reinforced the commitment from Azure to designing AI infrastructure that is optimized for training and inferencing in the cloud. In this document, one will find the steps to reproduce the results with the model Llama 2 from MLPerf Inference v4.0 on the new NC H100 v5 virtual machines.
Pre-requisites:
Step 1: Deploy and set up a virtual machine on Azure.
Step 2: Mount the NVMe disks
cd /mnt
sudo vi nvme.sh
Copy and paste the following mounting script:
#!/bin/bash
NVME_DISKS_NAME=`ls /dev/nvme*n1`
NVME_DISKS=`ls -latr /dev/nvme*n1 | wc -l`
echo "Number of NVMe Disks: $NVME_DISKS"
if [ "$NVME_DISKS" == "0" ]
then
exit 0
else
mkdir -p /mnt/resource_nvme
# Needed incase something did not unmount as expected. This will delete any data that may be left behind
mdadm --stop /dev/md*
mdadm --create /dev/md128 -f --run --level 0 --raid-devices $NVME_DISKS $NVME_DISKS_NAME
mkfs.xfs -f /dev/md128
mount /dev/md128 /mnt/resource_nvme
fi
chmod 1777 /mnt/resource_nvme
Run the script to mount the disk
sudo sh nvme.sh
Step 3: Set up docker
Update the Docker root directory in the docker daemon configuration file
sudo vi /etc/docker/daemon.json
Paste the following lines:
{
"data-root":"/mnt/resource_nvme/data",
"runtimes":{
"nvidia":{
"path":"nvidia-container-runtime",
"runtimeArgs":[]
}
}
}
Verify the previous steps and enable docker
docker --version
sudo systemctl restart docker
sudo systemctl enable docker
Register your user for Docker
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
You should not have any permission issues when running
docker info
Set up the environment:
Once your machine is deployed and configured, create a folder for the scripts and get the scripts from MLPerf Inference v4.0 repository.
cd /mnt/resource_nvme
git clone https://github.com/mlcommons/inference_results_v4.0.git
cd inference_results_v4.0/closed/Azure
Create folders for the data and model:
export MLPERF_SCRATCH_PATH=/mnt/resource_nvme/scratch
mkdir -p $MLPERF_SCRATCH_PATH
mkdir $MLPERF_SCRATCH_PATH/data $MLPERF_SCRATCH_PATH/models $MLPERF_SCRATCH_PATH/preprocessed_data
To download the model and the preprocessed dataset, please follow the steps in code/llama2-70b/tensorrt/README.md (a license is required).
Prebuild the container on the instance.
make prebuild
The system name is saved under code/common/systems/custom_list.py and the configuration files are located in configs/[benchmark]/[scenario]/custom.py.
You can finally build the container:
make build
Run the benchmark
Finally, run the benchmark with the make run command below. The performance result should match Azure’s official results published for MLPerf Inference v4.0.
make run RUN_ARGS="--benchmarks=llama2-70b --scenarios=offline,server --config_ver=high_accuracy"