Contributors: Davide Vanzo, Yuval Mazor, Jesse Lopez
DeepSeek-R1 is an open-weights reasoning model built on DeepSeek-V3, designed for conversational AI, coding, and complex problem-solving. It has gained significant attention beyond the AI/ML community due to its strong reasoning capabilities, often competing with OpenAI’s models. One of its key advantages is that it can be run locally, giving users full control over their data.
The NDv5 MI300X VM features 8x AMD Instinct MI300X GPUs, each equipped with 192GB of HBM3 and interconnected via Infinity Fabric 3.0. With up to 5.2 TB/s of memory bandwidth per GPU, the MI300X provides the necessary capacity and speed to process large models efficiently - enabling users to run DeepSeek-R1 at full precision on a single VM.
In this blog post, we’ll walk you through the steps to provision an NDv5 MI300X instance on Azure and run DeepSeek-R1 for inference using the SGLang inference framework.
Launching an NDv5 MI300X VM
Prerequisites
- Check that your subscription has sufficient vCPU quota for the VM family “StandardNDI Sv 5MI300X” (see Quota documentation).
- If needed, contact your Microsoft account representative to request quota increase.
- A Bash terminal with Azure CLI installed and logged into the appropriate tenant. Alternatively, Azure Cloud Shell can also be employed.
Provision the VM
1. Using Azure CLI, create an Ubuntu-22.04 VM on ND_MI300x_v5:
az group create –location <REGION> -n <RESOURCE_GROUP_NAME>
az vm create --name mi300x --resource-group <RESOURCE_GROUP_NAME> --location <REGION> --image Canonical:0001-com-ubuntu-server-jammy:22_04-lts-gen2:22.04.202410020 --size Standard_ND96isr_MI300X_v5 --security-type Standard --os-disk-size-gb 256 --os-disk-delete-option Delete --admin-username azureadmin --ssh-key-values <PUBLIC_SSH_PATH>
2. Log into the VM via SSH and downgrade the kernel to version 5.15.0:
sudo apt install -y linux-image-5.15.0-1073-azure linux-modules-5.15.0-1073-azure linux-headers-5.15.0-1073-azure linux-tools-5.15.0-1073-azure
sudo sed -i "s|GRUB_DEFAULT=.*|GRUB_DEFAULT='gnulinux-advanced-0b58668a-ba2e-4a00-b89a-3354b7a547d4>gnulinux-5.15.0-1073-azure-advanced-0b58668a-ba2e-4a00-b89a-3354b7a547d4'|g" /etc/default/grub
sudo update-grub
Remove the current kernel packages. Ensure to answer “No” when asked to abort kernel removal:
sudo apt remove -y linux-azure-6.5-cloud-tools-6.5.0-1025 linux-azure-6.5-headers-6.5.0-1025 linux-azure-6.5-tools-6.5.0-1025 linux-cloud-tools-6.5.0-1025-azure linux-headers-6.5.0-1025-azure linux-image-6.5.0-1025-azure linux-modules-6.5.0-1025-azure linux-tools-6.5.0-1025-azure
sudo reboot
After rebooting, confirm that the kernel in use is version 5.15.0:
uname -r
3. Install the required drivers and software:
git clone --branch mi300x https://github.com/vanzod/azhpc-images.git
cd azhpc-images/ubuntu/ubuntu-22.x/ubuntu-22.04-hpc
sudo ./install.sh AMD
Create a custom VM image (optional)
For additional flexibility in deploying VMs at future times we recommend creating a custom VM image. In this way the image will contain all the required components, without having to install them every time a new VM is deployed.
1. Generalize the VM for image creation:
sudo rm -f ~/.bash_history
sudo waagent -force -deprovision+user
2. Using Azure CLI, deallocate and mark the VM as generalized:
az vm deallocate --resource-group <RESOURCE_GROUP_NAME> --name mi300x
az vm generalize --resource-group <RESOURCE_GROUP_NAME> --name mi300x
3. Create a shared image gallery and save the custom image:
az sig create --resource-group <RESOURCE_GROUP_NAME> --gallery-name mi300xImages
VMID=$(az vm get-instance-view -g <RESOURCE_GROUP_NAME> -n mi300x --query id -o tsv)
az sig image-definition create --resource-group <RESOURCE_GROUP_NAME> --gallery-name mi300xImages --gallery-image-definition Ubuntu-2204-ROCm --publisher <PUBLISHER_NAME> --offer ubuntu2204 --sku ROCm --os-type Linux --hyper-v-generation v2 --features SecurityType=Standard
az sig image-version create --resource-group <RESOURCE_GROUP_NAME> --gallery-name mi300xImages --gallery-image-definition Ubuntu-2204-ROCm --gallery-image-version 1.0.0 --target-regions <REGION> --replica-count 1 --virtual-machine ${VMID}
4. Delete the virtual machine and the associated resources:
az vm delete --yes --resource-group <RESOURCE_GROUP_NAME> --name mi300x
az network nic delete --resource-group <RESOURCE_GROUP_NAME> --name mi300xNIC
az network public-ip delete --resource-group <RESOURCE_GROUP_NAME> --name mi300xPIP
5. Retrieve the VM image resource ID that will be needed when creating a new VM:
az image show --resource-group <RESOURCE_GROUP_NAME> --name Ubuntu-2204-ROCm --query id --output tsv
6. Create a new VM from custom image:
az vm create --name <VM_NAME> --resource-group <RESOURCE_GROUP_NAME> --location <REGION> --image <CUSTOM_IMAGE_RESOURCE_ID> --size Standard_ND96isr_MI300X_v5 --security-type Standard --os-disk-size-gb 256 --admin-username <USERNAME> --ssh-key-values <PUBLIC_SSH_PATH>
Additional preparation
Beyond provisioning the VM, there are additional steps to prepare the environment to optimally run DeepSeed, or other AI workloads including setting-up the 8 NVMe disks on the node in a RAID-0 configuration to act as the cache location for Docker and Hugging Face.
The following steps assume you have connected to the VM and working in a Bash shell.
1. Prepare the NVMe disks in a RAID-0 configuration
mkdir -p /mnt/resource_nvme/
sudo mdadm --create /dev/md128 -f --run --level 0 --raid-devices 8 $(ls /dev/nvme*n1)
sudo mkfs.xfs -f /dev/md128
sudo mount /dev/md128 /mnt/resource_nvme
sudo chmod 1777 /mnt/resource_nvme
2. Configure Hugging Face to use the RAID-0. This environmental variable should also be propagated to any containers pulling images or data from Hugging Face.
mkdir –p /mnt/resource_nvme/hf_cache
export HF_HOME=/mnt/resource_nvme/hf_cache
3. Configure Docker to use the RAID-0
mkdir -p /mnt/resource_nvme/docker
sudo tee /etc/docker/daemon.json > /dev/null <<EOF
{
"data-root": "/mnt/resource_nvme/docker"
}
EOF
sudo chmod 0644 /etc/docker/daemon.json
sudo systemctl restart docker
Using MI300X
If you are familiar with Nvidia and CUDA tools and environment, AMD provides equivalents as part of the ROCm stack.
MI300X + ROCm |
Nvidia + |
Description |
rocm-smi |
nvidia-smi |
CLI for monitoring the system and making changes |
rccl |
nccl |
Library for communication between GPUs |
Running DeepSeek-R1
1. Pull the container image. The originally recommended image was lmsystorg/sglang:v0.4.2-rocm620, but optimizations have been made that are not yet available from a tagged version in the lmsysorg/sglang repository. As such, we are temporarily recommending use of the image jesselopezmicrosoft/sglang:mi300x until a tagged version is available from lmsysorg/sglang. It is ~20 GB in size, so it may take a few minutes to download.
docker pull jesselopezmicrosoft/sglang:mi300x
2. Start the SGLang server. The model (~642 GB) is downloaded the first time it is launched and will take at least a few minutes to download. Once the application outputs “The server is fired up and ready to roll!”, you can begin making queries to the model.
docker run \
--device=/dev/kfd \
--device=/dev/dri \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--group-add video \
--privileged \
--shm-size 32g \
--ipc=host \
-p 30000:30000 \
-v /mnt/resource_nvme:/mnt/resource_nvme \
-e HF_HOME=/mnt/resource_nvme/hf_cache \
-e HSA_NO_SCRATCH_RECLAIM=1 \
jesselopezmicrosoft/sglang:mi300x \
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --host 0.0.0.0
3. You can now make queries to DeepSeek-R1. For example, these requests to the model from another shell on same host provide model data and will generate a sample response.
curl http://localhost:30000/get_model_info
{"model_path":"deepseek-ai/DeepSeek-R1","tokenizer_path":"deepseek-ai/DeepSeek-R1","is_generation":true}
curl http://localhost:30000/generate -H "Content-Type: application/json" -d '{ "text": "Once upon a time,", "sampling_params": { "max_new_tokens": 16, "temperature": 0.6 } }'
Conclusion
In this post, we detail how to run the full-size 671B DeepSeek-R1 model on a single Azure NDv5 MI300X instance. This includes setting up the machine, installing the necessary drivers, and executing the model. Happy inferencing!
References
Updated Feb 12, 2025
Version 5.0jesselopez
Microsoft
Joined December 01, 2022
Azure High Performance Computing (HPC) Blog
Follow this blog board to get notified when there's new activity