Blog Post

Containers

3 MIN READ

Bringing GPU acceleration to Windows containers

Former Employee

Apr 03, 2019

At the release of Windows Server 2019 last year, we announced support for a set of hardware devices in Windows containers. One popular type of device missing support at the time: GPUs. We’ve heard frequent feedback that you want hardware acceleration for your Windows container workloads, so today, we’re pleased to announce the first step on that journey: starting in Windows Server 2019, we now support GPU acceleration for DirectX-based apps and frameworks in Windows containers.

The best part is, you can use the Windows Server 2019 build you have today—no new OS patches or configuration is necessary. All you need is a new build of Docker and the latest display drivers. Read on for detailed requirements and to learn how you can get started with GPU accelerated DirectX in Windows containers today.

Background: Why GPU acceleration?

Containers are an excellent tool for packaging and deploying many kinds of workloads. For many of these, traditional CPU compute resources are sufficient. However, for a certain class of workload, the massively parallel compute power offered by GPUs (graphics processing units) can speed up operations by orders of magnitude, bringing down cost and improving throughput immensely.

GPUs are already a common tool for many popular workloads, from traditional rendering and simulation to machine learning training and inference. With today’s announcement, we’re unlocking new app scenarios for Windows containers and enabling more applications to be successfully shifted into Windows containers.

GPU-accelerated DirectX, Windows ML, and more

For some users, DirectX conjures associations with gaming. But DirectX is about more than games—it also powers a large ecosystem of multimedia, design, computation, and simulation frameworks and applications.

As we looked at adding GPU support to Windows containers, it was clear that starting with the DirectX APIs—the foundation of accelerated graphics, compute, and AI on Windows—was a natural first step.

By enabling GPU acceleration for DirectX, we’ve also enabled GPU acceleration for the frameworks built on top of it. One such framework is Windows ML, a set of APIs providing fast and efficient AI inferencing capabilities. With GPU acceleration in Windows containers, developers now have access to a first-class inferencing runtime that can be accelerated across a broad set of capable GPU acceleration hardware.

Usage

On a system meeting the requirements (see below), start a container with hardware-accelerated DirectX support by specifying the --device option at container runtime, as follows:

docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 <your Docker image>

Note that this does not assign GPU resources exclusively to the container, nor does it prevent GPU access on the host. Rather, GPU resources are scheduled dynamically across the host and containers in much the same way as they are scheduled among apps running on your personal device today. You can have several Windows containers running on a host, each with hardware-accelerated DirectX capabilities.

Requirements

For this feature to work, your environment must meet the following requirements:

The container host must be running Windows Server 2019 or Windows 10, version 1809 or newer.
The container base image must be mcr.microsoft.com/windows:1809 or newer. Windows Server Core and Nano Server container images are not currently supported.
The container must be run in process isolation mode. Hyper-V isolation mode is not currently supported.
The container host must be running Docker Engine 19.03 or newer.
The container host must have a GPU running display drivers version WDDM 2.5 or newer.

To check the WDDM version of your display drivers, run the DirectX Diagnostic Tool (dxdiag.exe) on your container host. In the tool’s “Display” tab, look in the “Drivers” section as indicated below.

Getting started

Operating system support for this feature is already complete and broadly available as part of Windows Server 2019 and Windows 10, version 1809. Formal Docker support is scheduled for the upcoming Docker EE Engine 19.03 release. Until then, if you’re eager to try out the feature early, you can check out our sample on GitHub and follow the README instructions to get started. We’ll show you how to acquire a nightly build of Docker and use it to run a containerized Windows ML inferencing app with GPU acceleration.

Going forward

We look forward to getting your feedback on this experience. Please leave a comment below or tweet us with your thoughts. What are the next things you’d like to be able to do with GPU acceleration in containers on Windows?

Cheers,

Rick Manning, Graphics PM

@CraigWilhite, Windows Container PM

Updated Apr 10, 2019

Version 4.0

Craig Wilhite

Former Employee

Joined July 13, 2017

View Profile

Containers

Follow this blog board to get notified when there's new activity

34 Comments

lostmsu
Copper Contributor
Mar 14, 2023
Is it possible to create a virtual DirectX output device for a container to enable the use of Desktop Duplication API?
nkef_gr
Copper Contributor
Dec 23, 2021
rickman_MSFT , Craig Wilhite
What is the status for supporting video encode using Media Foundation Transforms ?

jamessxxoo

Copper Contributor

Dec 23, 2021

it seems GPU is slower, what should be the reason?

(base) PS C:\Projects\Virtualization-Documentation\windows-container-samples\directx> docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 winml-runner

Created LearningModelDevice with CPU device

Created LearningModelDevice with GPU: Intel(R) UHD Graphics 630
Loading model (path = C:\App\tiny_yolov2\model.onnx)...
=================================================================
Name: Example Model
Author: OnnxMLTools
Version: 0
Domain: onnxconverter-common
Description: The Tiny YOLO network from the paper 'YOLO9000: Better, Faster, Stronger' (2016), arXiv:1612.08242
Path: C:\App\tiny_yolov2\model.onnx
Support FP16: false

Input Feature Info:
Name: image
Feature Kind: Image (Height: 416, Width:  416)

Output Feature Info:
Name: grid
Feature Kind: Float

=================================================================

Binding (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Evaluating (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Binding and Evaluating 999 more times...
Results (device = CPU, numIterations = 1000, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML):

First Iteration Performance (load, bind, session creation, and evaluate):
  Load: 86.8923 ms
  Bind: 0.1449 ms
  Session Creation: 40.6742 ms
  Evaluate: 34.9017 ms

  Working Set Memory usage (evaluate): 27.8672 MB
  Working Set Memory usage (load, bind, session creation, and evaluate): 155.789 MB
  Peak Working Set Memory Difference (load, bind, session creation, and evaluate): 187.664 MB

  Dedicated Memory usage (evaluate): 0 MB
  Dedicated Memory usage (load, bind, session creation, and evaluate): 0 MB

  Shared Memory usage (evaluate): 0 MB
  Shared Memory usage (load, bind, session creation, and evaluate): 0 MB

Average Performance excluding first iteration. Iterations 2 to 1000. (Iterations greater than 1 only bind and evaluate)
  Average Bind: 0.0815504 ms
  Average Evaluate: 26.6584 ms

  Average Working Set Memory usage (bind): 3.91016e-06 MB
  Average Working Set Memory usage (evaluate): 0.00594735 MB

  Average Dedicated Memory usage (bind): 0 MB
  Average Dedicated Memory usage (evaluate): 0 MB

  Average Shared Memory usage (bind): 0 MB
  Average Shared Memory usage (evaluate): 0 MB



Binding (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Evaluating (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Binding and Evaluating 999 more times...
Results (device = GPU, numIterations = 1000, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML):

First Iteration Performance (load, bind, session creation, and evaluate):
  Load: 86.8923 ms
  Bind: 1.3861 ms
  Session Creation: 4486.82 ms
  Evaluate: 38.5659 ms

  Working Set Memory usage (evaluate): 65.3438 MB
  Working Set Memory usage (load, bind, session creation, and evaluate): 245.676 MB
  Peak Working Set Memory Difference (load, bind, session creation, and evaluate): 186.244 MB

  Dedicated Memory usage (evaluate): 0 MB
  Dedicated Memory usage (load, bind, session creation, and evaluate): 0 MB

  Shared Memory usage (evaluate): 0 MB
  Shared Memory usage (load, bind, session creation, and evaluate): 0 MB

Average Performance excluding first iteration. Iterations 2 to 1000. (Iterations greater than 1 only bind and evaluate)
  Average Bind: 0.260335 ms
  Average Evaluate: 27.0888 ms

  Average Working Set Memory usage (bind): 0 MB
  Average Working Set Memory usage (evaluate): -0.0607131 MB

  Average Dedicated Memory usage (bind): 0 MB
  Average Dedicated Memory usage (evaluate): 0 MB

  Average Shared Memory usage (bind): 0 MB
  Average Shared Memory usage (evaluate): 0 MB

LayneBernardo
Copper Contributor
Nov 18, 2020
>> invalid argument "class/5B45201D-F2F2-4F3B-85BB-30FF1F953599" for "--device" flag: class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 is not an absolute path

I am getting this same error running Docker v19.03.13 on Windows 10. I have enabled experimental features both in the "Experimental Features" section of the settings as well as enabling the "experimental" flag in the Docker engine. Not sure what I'm doing wrong here. Any tips?
Thomasin
Copper Contributor
Sep 05, 2019
rickman_MSFT
Thanks for your reply. This sounds very promising!

We just saw that this https://github.com/aarnaud/k8s-directx-device-plugin seems to solve the problem, but requires the https://github.com/kubernetes/kubernetes/pull/80917. In contrary to my previous post, I now think that a device plugin is the right approach, and this particular implementation looks good to me. What do you think about it?

Maybe Microsoft can throw in its weight and support the acceptance of that pull request? 😉
rickman_MSFT
Microsoft
Aug 29, 2019
Thomasin said:

Is it possible to launch such GPU accelerated Windows containers via Kubernetes?

Unfortunately, Kubernetes does not yet support resource allocation and enablement of GPU acceleration for Windows containers. It's something we're actively looking into, however, as we know many container customers prefer to deploy their containers using Kubernetes.
Thomasin
Copper Contributor
Aug 20, 2019
Craig Wilhite, rickman_MSFT May I ask a question?
Is it possible to launch such GPU accelerated Windows containers via Kubernetes?

Kubernetes seems to launch containers already in process isolation mode, corresponding to the --isolation process option you are mentioning in the article. But what is the Kubernetes equivalent of the --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 option? We see no need for a Kubernetes device plugin, which seems to be the wrong way anyway, since it would assign GPU resources exclusively to a container.

We can successfully run our application in a GPU accelerated Windows container via docker run. We can also create Windows containers via Kubernetes, but without GPU access yet. Optimistically hoping that it is only a small thing we are missing.
Thomasin
Copper Contributor
Aug 16, 2019
Thanks for the great work!
This is really exciting!
Yurk_
Copper Contributor
Aug 02, 2019
I second the need to support CUDA in docker. Our researchers need to accelerate machine learning code written e.g. in tensorflow using CUDA, or run scientific simulations that use the GPU. It would be great if CUDA and GPU passthrough could be supported in docker and in WSL 2. Currently we are forced to used linux because of this, but would prefer to use Windows Server, if GPU passthrough becomes possible.
rickman_MSFT
Microsoft
Jul 25, 2019
docker1575 wrote:

Does it support to install NVIDIA Driver in the container to using NVIDIA??

Thanks for your question. I can see two possible ways to interpret your question, so I will answer both interpretations:

1. Does this enable my Windows containers to get hardware acceleration on NVIDIA GPUs?

Yes. If you have NVIDIA drivers installed on the container host (that meet the requirements described in the blog post), then when you run a container with the --device parameter as described in the blog post, your apps can get hardware acceleration whenever they use the DirectX graphics and compute APIs inside the container. You don't even need to install those drivers in the container; Docker automatically makes the right drivers from the host available to the container.

2. Does this enable my Windows containers to get hardware-accelerated CUDA?

No. Hardware acceleration is currently only supported for the DirectX APIs (and higher-level APIs built on DirectX) but does not include CUDA or similar APIs. We've heard lots of feedback that customers are interested in those, and we're actively investigating support for those.