Exploring New GPU Virtualization Features: A Closer Look

Former Employee

Jun 19, 2024

In the rapidly evolving landscape of artificial intelligence (AI), the demand for more powerful and efficient computing resources is ever-increasing. Microsoft is at the forefront of this technological revolution, empowering customers to harness the full potential of their AI workloads with their GPUs. GPU virtualization makes the ability to process massive amounts of data quickly and efficiently possible.

With Windows Server 2025 Microsoft is introducing multiple new virtualized GPU advancements, including GPUs with clustered VMs through DDA (Discrete Device Assignment), GPU-P (GPU Partitioning) and Live Migration for GPU-Ps. Using GPUs with clustered VMs through DDA (Discrete Device Assignment) becomes particularly significant in failover clusters, offering direct GPU access. These new features will provide benefits for compute-heavy workloads, including machine learning and virtual desktop workloads such as CAD (Computer Aided Design) or FEA (finite element analysis).

NEW! Move your GPU Partitioned Devices Quickly with Live Migration

GPU-P or GPU partitioning allows users to share a single physical GPU device with multiple virtual machines (VMs) by providing each VM with a dedicated portion of the GPU’s capacity. This allows each VM to have the dedicated resources it needs for its specific workload. With heightened priority on security, GPU-P uses SR-IOV (single root I/O virtualization) to create a hardware-backed security boundary layer for each VM. This prevents unauthorized access from other VMs by ensuring each VM only has access to the specific GPU resources dedicated to that VM.  

Live Migration will now be enabled for GPU-P devices starting with Windows Server 2025 and Azure Stack HCI 24H2 OS releases later this year. Live Migration allows customers to provide maintenance and updates to their VM fleets with minimal workload impact. Live Migration enables the use of cluster-aware updating (CAU) on failover clusters nodes for GPU VMs. CAU allows automated updating of cluster nodes by moving the workload and cluster resources to a new node prior to a patch being applied. This allows the workloads to maintain availability with little to no impact.  With Live Migration, and the use of CAU, customers keep their datacenters fleets secure, updated and running so they can provide the services their customers rely on. 

Figure 1 – Example of GPU partitioning between two VMs  

GPU-P brings virtualization to the modern era by no longer requiring an entire GPU to be given to a single VM. The added feature of Live Migration will ensure customers can maintain their GPU-P workloads without impact while systems are up and running. GPU-P devices and hardware will now natively support virtualization, helping drive AI innovation.  

Live migration scenarios for GPU-P include clustered environments and standalone servers (outside a cluster). Live Migration will be enabled for GPU-P VMs with the Windows Server 2025 and Azure Stack HCI 24H2 OS releases later this year.  GPU-P is coming to Windows Server 2025 and is already enabled and available on Azure Stack HCI since the 22H2 OS release.  

System Requirements for GPU-P 

Supported GPU-P Devices 

NVIDIA A2, L4, A10, A16, A40, L40, L40S

Note:  

Above GPU devices listed are those currently supported for GPU-P and GPU-P Live Migration

This list of GPU devices is expected to expand in the future as IVHs update their GPU product portfolios. Check with IHVs for latest supported devices for GPU partitioning.

CPU Requirements 

AMD EPYC 7002 and later (also known by codename AMD Milan)

5th Generation Intel® Xeon® Scalable Processors and newer (also known by codename Intel Emerald Rapids)

NEW! Use GPUs with Clustered VMs through Direct Device Assignment
Using GPUs with clustered VMs through DDA allows you to assign one or more entire physical GPUs to a single virtual machine (VM). DDA allows virtual machines (VMs) to have direct access to the physical GPUs. This results in reduced latency and full utilization of the GPU’s capabilities, which is crucial for compute-intensive tasks.

Figure 1: This diagram shows users using GPU with clustered VMs via DDA, where full physical GPU are assigned to VMs.

Using GPUs with clustered VMs enables these high-compute workloads to be executed within a failover cluster. A failover cluster is a group of independent nodes that work together to increase the availability of clustered roles. If one or more of the cluster nodes fail, the other nodes begin to provide service, meaning high availability by failover clusters. By integrating GPU with clustered VMs, these clusters can now support high-compute workloads on VMs. Failover clusters use GPU pools, which are managed by the cluster. An administrator creates these GPU pools name and declares a VM’s GPU needs. Pools are created on each node with the same name. Once GPUs and VMs are added to the pools, the cluster then manages VM placement and GPU assignment. Although live migration is not supported, in the event of a server failure, workloads can automatically restart on another node, minimizing downtime and ensuring continuity.

Using GPU with clustered VMs through DDA will be available in Windows Server 2025 Datacenter and was initially enabled in Azure Stack HCI 22H2.

To use GPU with clustered VMs, you are required to have a Failover Cluster that operates on Windows Server 2025 Datacenter edition and ensure the functional level of the cluster is at the Windows Server 2025 level. Each node in the cluster must have the same set up, and same GPUs in order to enable GPU with clustered VMs for failover cluster functionality . DDA does not currently support live migration. DDA is not supported by every GPU. In order to verify if your GPU works with DDA, contact your GPU manufacturer. Ensure you adhere to the setup guidelines provided by the GPU manufacturer, which includes installing the GPU manufacturer specific drivers on each server of the cluster and obtaining manufacturer-specific GPU licensing where applicable.

For more information on using GPU with clustered VMs, GPU Partitioning and GPU-P Live Migration please review our documentation below:

Introducing GPU Innovations with Windows Server 2025 - Microsoft Community Hub 

Partition and share GPUs with virtual machines on Hyper-V | Microsoft Learn 

Partition and assign GPUs to a virtual machine in Hyper-V | Microsoft Learn 

Use GPUs with clustered VMs on Hyper-V | Microsoft Learn

Deploy graphics devices by using Discrete Device Assignment | Microsoft Learn

Updated Oct 08, 2024

Version 2.0

virtualization

afiaboakye

Former Employee

Joined June 06, 2024