Blog Post

Azure High Performance Computing (HPC) Blog
6 MIN READ

Running DeepSeek-R1 on a single NDv5 MI300X VM

jesselopez's avatar
jesselopez
Icon for Microsoft rankMicrosoft
Feb 01, 2025

Contributors: Davide Vanzo, Yuval Mazor, Jesse Lopez

 

DeepSeek-R1 is an open-weights reasoning model built on DeepSeek-V3, designed for conversational AI, coding, and complex problem-solving. It has gained significant attention beyond the AI/ML community due to its strong reasoning capabilities, often competing with OpenAI’s models. One of its key advantages is that it can be run locally, giving users full control over their data. 

The NDv5 MI300X VM features 8x AMD Instinct MI300X GPUs, each equipped with 192GB of HBM3 and interconnected via Infinity Fabric 3.0. With up to 5.2 TB/s of memory bandwidth per GPU, the MI300X provides the necessary capacity and speed to process large models efficiently - enabling users to run DeepSeek-R1 at full precision on a single VM. 

In this blog post, we’ll walk you through the steps to provision an NDv5 MI300X instance on Azure and run DeepSeek-R1 for inference using the SGLang inference framework. 

Launching an NDv5 MI300X VM 

Prerequisites 

  • Check that your subscription has sufficient vCPU quota for the VM family “StandardNDI Sv 5MI300X” (see Quota documentation). 
  • If needed, contact your Microsoft account representative to request quota increase. 
  • A Bash terminal with Azure CLI installed and logged into the appropriate tenant. Alternatively, Azure Cloud Shell can also be employed. 

Provision the VM

1. Using Azure CLI, create an Ubuntu-22.04 VM on ND_MI300x_v5:

az group create –location <REGION> -n <RESOURCE_GROUP_NAME> 
az vm create --name mi300x --resource-group <RESOURCE_GROUP_NAME> --location <REGION> --image Canonical:0001-com-ubuntu-server-jammy:22_04-lts-gen2:22.04.202410020 --size Standard_ND96isr_MI300X_v5 --security-type Standard --os-disk-size-gb 256 --os-disk-delete-option Delete --admin-username azureadmin --ssh-key-values <PUBLIC_SSH_PATH>

2. Log into the VM via SSH and downgrade the kernel to version 5.15.0: 

sudo apt install -y linux-image-5.15.0-1073-azure linux-modules-5.15.0-1073-azure linux-headers-5.15.0-1073-azure linux-tools-5.15.0-1073-azure 
sudo sed -i "s|GRUB_DEFAULT=.*|GRUB_DEFAULT='gnulinux-advanced-0b58668a-ba2e-4a00-b89a-3354b7a547d4>gnulinux-5.15.0-1073-azure-advanced-0b58668a-ba2e-4a00-b89a-3354b7a547d4'|g" /etc/default/grub 
sudo update-grub 

Remove the current kernel packages. Ensure to answer “No” when asked to abort kernel removal: 

sudo apt remove -y linux-azure-6.5-cloud-tools-6.5.0-1025 linux-azure-6.5-headers-6.5.0-1025 linux-azure-6.5-tools-6.5.0-1025 linux-cloud-tools-6.5.0-1025-azure linux-headers-6.5.0-1025-azure linux-image-6.5.0-1025-azure linux-modules-6.5.0-1025-azure linux-tools-6.5.0-1025-azure 
sudo reboot 


After rebooting, confirm that the kernel in use is version 5.15.0: 

uname -r

3. Install the required drivers and software: 

git clone --branch mi300x https://github.com/vanzod/azhpc-images.git 
cd azhpc-images/ubuntu/ubuntu-22.x/ubuntu-22.04-hpc 
sudo ./install.sh AMD

Create a custom VM image (optional) 

 
For additional flexibility in deploying VMs at future times we recommend creating a custom VM image. In this way the image will contain all the required components, without having to install them every time a new VM is deployed. 
 

1. Generalize the VM for image creation: 

sudo rm -f ~/.bash_history
sudo waagent -force -deprovision+user 

2. Using Azure CLI, deallocate and mark the VM as generalized: 

az vm deallocate --resource-group <RESOURCE_GROUP_NAME> --name mi300x 
az vm generalize --resource-group <RESOURCE_GROUP_NAME> --name mi300x 

 

3. Create a shared image gallery and save the custom image: 

az sig create --resource-group <RESOURCE_GROUP_NAME> --gallery-name mi300xImages 
VMID=$(az vm get-instance-view -g <RESOURCE_GROUP_NAME> -n mi300x --query id -o tsv) 
az sig image-definition create --resource-group <RESOURCE_GROUP_NAME> --gallery-name mi300xImages --gallery-image-definition Ubuntu-2204-ROCm --publisher <PUBLISHER_NAME> --offer ubuntu2204 --sku ROCm --os-type Linux --hyper-v-generation v2 --features SecurityType=Standard 
az sig image-version create --resource-group <RESOURCE_GROUP_NAME> --gallery-name mi300xImages --gallery-image-definition Ubuntu-2204-ROCm --gallery-image-version 1.0.0 --target-regions <REGION> --replica-count 1 --virtual-machine ${VMID}

4. Delete the virtual machine and the associated resources: 

az vm delete --yes --resource-group <RESOURCE_GROUP_NAME> --name mi300x 
az network nic delete --resource-group <RESOURCE_GROUP_NAME> --name mi300xNIC 
az network public-ip delete --resource-group <RESOURCE_GROUP_NAME> --name mi300xPIP

5. Retrieve the VM image resource ID that will be needed when creating a new VM: 

az image show --resource-group <RESOURCE_GROUP_NAME> --name Ubuntu-2204-ROCm --query id --output tsv

6. Create a new VM from custom image: 

az vm create --name <VM_NAME> --resource-group <RESOURCE_GROUP_NAME> --location <REGION> --image <CUSTOM_IMAGE_RESOURCE_ID> --size Standard_ND96isr_MI300X_v5 --security-type Standard --os-disk-size-gb 256 --admin-username <USERNAME> --ssh-key-values <PUBLIC_SSH_PATH>  

Additional preparation

Beyond provisioning the VM, there are additional steps to prepare the environment to optimally run DeepSeed, or other AI workloads including setting-up the 8 NVMe disks on the node in a RAID-0 configuration to act as the cache location for Docker and Hugging Face. 

The following steps assume you have connected to the VM and working in a Bash shell.

1. Prepare the NVMe disks in a RAID-0 configuration  

mkdir -p /mnt/resource_nvme/
sudo mdadm --create /dev/md128 -f --run --level 0 --raid-devices 8 $(ls /dev/nvme*n1)  
sudo mkfs.xfs -f /dev/md128 
sudo mount /dev/md128 /mnt/resource_nvme 
sudo chmod 1777 /mnt/resource_nvme  

2. Configure Hugging Face to use the RAID-0.  This environmental variable should also be propagated to any containers pulling images or data from Hugging Face.

mkdir –p /mnt/resource_nvme/hf_cache 
export HF_HOME=/mnt/resource_nvme/hf_cache 

3. Configure Docker to use the RAID-0

mkdir -p /mnt/resource_nvme/docker 
sudo tee /etc/docker/daemon.json > /dev/null <<EOF 
{ 
    "data-root": "/mnt/resource_nvme/docker" 
} 
EOF 
sudo chmod 0644 /etc/docker/daemon.json 
sudo systemctl restart docker 

Using MI300X 

If you are familiar with Nvidia and CUDA tools and environment, AMD provides equivalents as part of the ROCm stack.

MI300X + ROCm 

Nvidia +
CUDA

Description 

rocm-smi 

nvidia-smi 

CLI for monitoring the system and making changes 

rccl 

nccl 

Library for communication between GPUs 

 

Running DeepSeek-R1 

1. Pull the container image.  The originally recommended image was lmsystorg/sglang:v0.4.2-rocm620, but optimizations have been made that are not yet available from a tagged version in the lmsysorg/sglang repository.  As such, we are temporarily recommending use of the image jesselopezmicrosoft/sglang:mi300x until a tagged version is available from lmsysorg/sglang.  It is ~20 GB in size, so it may take a few minutes to download.

docker pull jesselopezmicrosoft/sglang:mi300x

2. Start the SGLang serverThe model (~642 GB) is downloaded the first time it is launched and will take at least a few minutes to downloadOnce the application outputs “The server is fired up and ready to roll!”, you can begin making queries to the model. 

docker run \
  --device=/dev/kfd \
  --device=/dev/dri \
  --security-opt seccomp=unconfined \
  --cap-add=SYS_PTRACE \
  --group-add video \
  --privileged \
  --shm-size 32g \
  --ipc=host \
  -p 30000:30000 \
  -v /mnt/resource_nvme:/mnt/resource_nvme \
  -e HF_HOME=/mnt/resource_nvme/hf_cache \
  -e HSA_NO_SCRATCH_RECLAIM=1 \
  jesselopezmicrosoft/sglang:mi300x \
  python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --host 0.0.0.0 

3. You can now make queries to DeepSeek-R1.  For example,  these requests to the model from another shell on same host provide model data and will generate a sample response.

curl http://localhost:30000/get_model_info 
{"model_path":"deepseek-ai/DeepSeek-R1","tokenizer_path":"deepseek-ai/DeepSeek-R1","is_generation":true} 
curl http://localhost:30000/generate -H "Content-Type: application/json" -d '{ "text": "Once upon a time,", "sampling_params": { "max_new_tokens": 16, "temperature": 0.6 } }'

Conclusion 

In this post, we detail how to run the full-size 671B DeepSeek-R1 model on a single Azure NDv5 MI300X instance. This includes setting up the machine, installing the necessary drivers, and executing the model. Happy inferencing!

References

 

Updated Feb 12, 2025
Version 5.0
No CommentsBe the first to comment