Achieving Optimal Performance for DeepSeek Expert Parallelism (DeepEP) on Azure

Microsoft

May 16, 2025

This blog post presents practical techniques for optimizing the performance of DeepEP on Azure-based GPU clusters. DeepEP is a high-performance communication library designed to accelerate Mixture-of-Experts (MoE) models through efficient expert parallelism. It leverages NVSHMEM for one-sided GPU communication, enabling low-latency, host-bypass data transfers across nodes. The focus of this post is on affinity-aware optimization, demonstrating how to align processes with NUMA topology, GPUs and network interfaces to minimize communication overhead. We describe code-level modifications using psutil, libnuma, and NVSHMEM environment variables to set CPU cores, GPUs and memory affinities during initialization, ensuring optimal hardware placement. These enhancements significantly improve DeepEP's communication efficiency and overall performance when deployed in distributed training on Azure.

DeepEP DeepEP is a high-performance communication library developed by DeepSeek AI to optimize Mixture-of-Experts (MoE) and expert parallelism (EP) in large-scale AI models. It provides high-throug...

Updated May 21, 2025

Version 3.0

Microsoft

Joined May 15, 2025

View Profile

Azure High Performance Computing (HPC) Blog

Follow this blog board to get notified when there's new activity

Blog Post

Achieving Optimal Performance for DeepSeek Expert Parallelism (DeepEP) on Azure