Windows Server Summit 2024
Mar 26 2024 08:00 AM - Mar 28 2024 04:30 PM (PDT)
Microsoft Tech Community
LIVE
Bringing GPU acceleration to Windows containers
Published Apr 03 2019 09:10 AM 42.4K Views
Microsoft

At the release of Windows Server 2019 last year, we announced support for a set of hardware devices in Windows containers. One popular type of device missing support at the time: GPUs. We’ve heard frequent feedback that you want hardware acceleration for your Windows container workloads, so today, we’re pleased to announce the first step on that journey: starting in Windows Server 2019, we now support GPU acceleration for DirectX-based apps and frameworks in Windows containers.

 

The best part is, you can use the Windows Server 2019 build you have today—no new OS patches or configuration is necessary. All you need is a new build of Docker and the latest display drivers. Read on for detailed requirements and to learn how you can get started with GPU accelerated DirectX in Windows containers today.

 

Background: Why GPU acceleration?

 

Containers are an excellent tool for packaging and deploying many kinds of workloads. For many of these, traditional CPU compute resources are sufficient. However, for a certain class of workload, the massively parallel compute power offered by GPUs (graphics processing units) can speed up operations  by orders of magnitude, bringing down cost and improving throughput immensely.

 

GPUs are already a common tool for many popular workloads, from traditional rendering and simulation to machine learning training and inference. With today’s announcement, we’re unlocking new app scenarios for Windows containers and enabling more applications to be successfully shifted into Windows containers.

 

GPU-accelerated DirectX, Windows ML, and more

 

For some users, DirectX conjures associations with gaming. But DirectX is about more than games—it also powers a large ecosystem of multimedia, design, computation, and simulation frameworks and applications.

 

As we looked at adding GPU support to Windows containers, it was clear that starting with the DirectX APIs—the foundation of accelerated graphics, compute, and AI on Windows—was a natural first step.

 

By enabling GPU acceleration for DirectX, we’ve also enabled GPU acceleration for the frameworks built on top of it. One such framework is Windows ML, a set of APIs providing fast and efficient AI inferencing capabilities. With GPU acceleration in Windows containers, developers now have access to a first-class inferencing runtime that can be accelerated across a broad set of capable GPU acceleration hardware.overview-diagram.png

 

Usage

 

On a system meeting the requirements (see below), start a container with hardware-accelerated DirectX support by specifying the --device option at container runtime, as follows:

 

docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 <your Docker image>

 

Note that this does not assign GPU resources exclusively to the container, nor does it prevent GPU access on the host. Rather, GPU resources are scheduled dynamically across the host and containers in much the same way as they are scheduled among apps running on your personal device today. You can have several Windows containers running on a host, each with hardware-accelerated DirectX capabilities.

 

Requirements

 

For this feature to work, your environment must meet the following requirements:

  • The container host must be running Windows Server 2019 or Windows 10, version 1809 or newer.
  • The container base image must be mcr.microsoft.com/windows:1809 or newer. Windows Server Core and Nano Server container images are not currently supported.
  • The container must be run in process isolation mode. Hyper-V isolation mode is not currently supported.
  • The container host must be running Docker Engine 19.03 or newer.
  • The container host must have a GPU running display drivers version WDDM 2.5 or newer.

To check the WDDM version of your display drivers, run the DirectX Diagnostic Tool (dxdiag.exe) on your container host. In the tool’s “Display” tab, look in the “Drivers” section as indicated below.

 

dxdiag.png

 

Getting started

 

Operating system support for this feature is already complete and broadly available as part of Windows Server 2019 and Windows 10, version 1809. Formal Docker support is scheduled for the upcoming Docker EE Engine 19.03 release. Until then, if you’re eager to try out the feature early, you can check out our sample on GitHub and follow the README instructions to get started. We’ll show you how to acquire a nightly build of Docker and use it to run a containerized Windows ML inferencing app with GPU acceleration.

 

Going forward

 

We look forward to getting your feedback on this experience. Please leave a comment below or tweet us with your thoughts. What are the next things you’d like to be able to do with GPU acceleration in containers on Windows?

 

Cheers,

Rick Manning, Graphics PM

@CraigWilhite, Windows Container PM

 

34 Comments
Copper Contributor

That is good news! I would be interested in OpenGL and CUDA support, how far away is that? The main use case would be offscreen/remote rendering.

Copper Contributor

Hello. The docker hub link that you mention doesn't seem to work, any idea why? mcr.microsoft.com/windows:1809 

Microsoft

@deltaflo thanks! We don't have a timeline yet for frameworks beyond DirectX. I get the OpenGL interest for remote rendering; how does CUDA fit into that for you?

@ionlucas the hyperlink was broken but is now fixed. Thanks for catching that! 

Copper Contributor

 We use CUDA for remote rendering (interactive photo realistic path tracing) and machine learning (TensorFlow).

I think most machine learning people would love CUDA support.

Currently one has to use linux with nvidia-docker, it would be great if Windows docker container could do the same.

Copper Contributor

@Craig Wilhite, does it support hardware acceleration for video encoding with Microsoft Media Foundation and hardware acceleration for Video decoding with DXVA 2.0 ?

Microsoft

@nkef_gr wrote:

 

does it supports hardware acceleration for video encoding with Microsoft Media Foundation and hardware acceleration for Video decoding with DXVA 2.0 ?

Thanks for your question! Currently Windows containers do support hardware-accelerated video decode using DXVA, but do not support hardware-accelerated video encode using Media Foundation Transforms. We're investigating enabling the latter in an upcoming release.

Copper Contributor

@rickman_MSFT, that are very good news, is there any time-frame when the hardware-accelerated video encode using Media Foundation Transforms will be released ?

Microsoft

@nkef_gr wrote:

 

is there any time-frame when the hardware-accelerated video encode using Media Foundation Transforms will be released ?

Unfortunately I'm not able to share a timeline for this support yet, but we're actively investigating it and would like to enable it as soon as we can.

Copper Contributor

Hello, any news about tensorflow support?

Microsoft

@Daniele_Bulgarelli wrote:

 

Hello, any news about tensorflow support?

Tensorflow itself is just an ML framework that you can accelerate with a GPU run time as the back-end (so you could, for example, run Tensorflow right now in a Windows container and have it use the CPU--but that's probably not very interesting to you). In the context of running Tensorflow workloads on the GPU, which GPU back-end is of interest to you?

Copper Contributor

@Craig Wilhite mmh..  it will be possible to save model created with tensorflow in ONNX, after that use DirectML to accelerate the inference. It can be interesting. 

Copper Contributor

@Craig Wilhite Hello. Are you providing GPU to the container or just doing some king of mapping for DirectX? I mean is GPU available inside container with this technology? For example is it possible to add GPU drivers to container and use specific API, like NVIDIA Video Codec SDK?

Microsoft

@Ivan_Ushakov wrote:

@Craig Wilhite Hello. Are you providing GPU to the container or just doing some king of mapping for DirectX? I mean is GPU available inside container with this technology? For example is it possible to add GPU drivers to container and use specific API, like NVIDIA Video Codec SDK?

Hi Ivan. Thank you for your question. The technology enabling this configuration is essentially "providing a GPU to the container;" it's not just a DirectX API forwarding layer. As of today we only officially support GPU for DirectX inside a Windows container, but we understand there are plenty of container workloads that use non-DirectX APIs. So we're actively investigating support for those non-DirectX APIs, such as NVIDIA's as you mentioned.

Copper Contributor

@rickman_MSFTHi. Thank you for this explanation. You say "officially support GPU for DirectX", so "unofficially" I could try to use this technology with other API? Or you restrict this somehow at Docker level? Just want to do some experiments with other API.

Microsoft

@Ivan_Ushakov wrote:

 

"unofficially" I could try to use this technology with other API? Or you restrict this somehow at Docker level? Just want to do some experiments with other API.

I'm not aware of anything at the Docker or OS level intentionally restricting GPU acceleration with these other APIs, however I would not expect them to work. I believe there is missing OS and/or driver functionality that would be required to make these work; this is the focus of our investigations into enabling them.

Copper Contributor

docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 winml-runner

 

invalid argument "class/5B45201D-F2F2-4F3B-85BB-30FF1F953599" for "--device" flag: class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 is not an absolute path
See 'docker run --help'.

 

How to set the --device?

Microsoft

@docker1575 wrote:

 

invalid argument "class/5B45201D-F2F2-4F3B-85BB-30FF1F953599" for "--device" flag: class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 is not an absolute path
See 'docker run --help'.

 

How to set the --device?

This error occurs if you're not running a Docker client/engine that supports '--device' arg for Windows containers. This functionality is not yet available in non-edge editions of Docker Desktop for Windows. Have you verified you're running the latest version of Docker Desktop for Windows Edge

Copper Contributor

@Craig Wilhite 

Thank you very much. I found the mistake and fixed it now.

But I still met a problem now, When I execute "docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 winml-runner"

the powershell has no response.

 

And I open a new powershell window to execute "docker ps -a" ,  The STATUS is Created!! 

CONTAINER ID        IMAGE                            COMMAND                    CREATED              STATUS                PORTS               NAMES
17efd899cf91        winml-runner                     "C:/App/WinMLRunner_…"     13 seconds ago       Created                                   mystifying_lumiere

 

And I execute "docker logs mystifying_lumiere", the powershell has no response too.

 

What's wrong?  why not running status?

 

My environment is :

HOST OS: Windows 10 Professional 1903

The container base image: mcr.microsoft.com/windows:1903

docker version:

Client: Docker Engine - Community
 Version:           19.03.0-rc2
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        f97efcc
 Built:             Wed Jun  5 01:37:59 2019
 OS/Arch:           windows/amd64
 Experimental:      false
Server: Docker Engine - Community
 Engine:
  Version:          19.03.0-rc2
  API version:      1.40 (minimum version 1.24)
  Go version:       go1.12.5
  Git commit:       f97efcc
  Built:            Wed Jun  5 01:52:18 2019
  OS/Arch:          windows/amd64
  Experimental:     false
 
Driver Model: 2.6 WDDM
Driver: NVIDIA GeForce GTX 1070
Copper Contributor

@Craig Wilhite 

  1. Could you please provide a baseline model inference tests on DirectX vs. CUDA?
  2. Our code is all in `TensorFlow`, running inside `nvidia-docker`. Can you please elaborate on how hard would it be to port over the models?

Thanks!

Copper Contributor

According to "Version compatibility" below:

https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatib...

 

It supports process isolation only when the Host OS is windows server as the table below , Please make sure does windows 10 could support process isolation or not??

Container OS version                                      Host OS version                                   Compatibility

Windows Server 2019, version 1903
Builds 18362.*
Windows Server, version 1903
Builds 18362.*
Supports process or hyperv isolation
Windows Server 2019
Builds 17763.*
Windows Server 2019
Builds 17763.*
Supports process or hyperv isolation
Windows Server, version 1803
Builds 17134.*
Windows Server, version 1803
Builds 17134.*
Supports process or hyperv isolation
Windows Server, version 1709
Builds 16299.*
Windows Server, version 1709
Builds 16299.*
Supports process or hyperv isolation
Windows Server 2016
Builds: 14393.*
Windows Server 2016
Builds: 14393.*
Supports process or hyperv isolation
   
Microsoft

@docker1575, process isolation on Windows 10 should work for dev/test workflows. We no longer outright block process isolation mode on Windows 10 client SKUs in Docker, but it's not a production-supported scenario. The point is, everything described in this blog post should work on Windows 10 if you're running a version 1809 or newer host with the latest Docker engine.

Copper Contributor

@Craig Wilhite 

My environment has met the requriements as below, but it still cannot run as your blog. Does any other settings need to be configured??

 

HOST OS: Windows 10 Professional 1903 18362.30

The container base image: mcr.microsoft.com/windows:1903

docker version:

Client: Docker Engine - Community
 Version:           19.03.0-rc2
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        f97efcc
 Built:             Wed Jun  5 01:37:59 2019
 OS/Arch:           windows/amd64
 Experimental:      false
Server: Docker Engine - Community
 Engine:
  Version:          19.03.0-rc2
  API version:      1.40 (minimum version 1.24)
  Go version:       go1.12.5
  Git commit:       f97efcc
  Built:            Wed Jun  5 01:52:18 2019
  OS/Arch:          windows/amd64
  Experimental:     false
 
Driver Model: 2.6 WDDM
Driver: NVIDIA GeForce GTX 1070
Microsoft

@statikk wrote:

1. Could you please provide a baseline model inference tests on DirectX vs. CUDA?

Are you asking about a performance comparison between DirectX and CUDA? Performance depends on a number of different factors (the model being evaluated, input types, device hardware, graphics drivers, etc.) so results tend to be specific to a developer's unique scenario. However, the developers behind DirectX and the Windows AI stack (WinML, DirectML, and related technologies) work extremely closely with hardware vendors to ensure consistent results and performance across the broad range of Windows devices and GPUs.

 

@statikk wrote:

2. Our code is all in `TensorFlow`, running inside `nvidia-docker`. Can you please elaborate on how hard would it be to port over the models?

Again, this depends on the details of your unique situation, but Microsoft does provide tools for porting models to ONNX, the Open Neural Network Exchange format. You can learn more about model conversion here: Convert ML models to ONNX with WinMLTools.

Copper Contributor

Does it support to install NVIDIA Driver in the container to using NVIDIA??

Microsoft

@docker1575 wrote:

Does it support to install NVIDIA Driver in the container to using NVIDIA??

Thanks for your question. I can see two possible ways to interpret your question, so I will answer both interpretations:

 

1. Does this enable my Windows containers to get hardware acceleration on NVIDIA GPUs?

Yes. If you have NVIDIA drivers installed on the container host (that meet the requirements described in the blog post), then when you run a container with the --device parameter as described in the blog post, your apps can get hardware acceleration whenever they use the DirectX graphics and compute APIs inside the container. You don't even need to install those drivers in the container; Docker automatically makes the right drivers from the host available to the container.

 

2. Does this enable my Windows containers to get hardware-accelerated CUDA?

No. Hardware acceleration is currently only supported for the DirectX APIs (and higher-level APIs built on DirectX) but does not include CUDA or similar APIs. We've heard lots of feedback that customers are interested in those, and we're actively investigating support for those.

Copper Contributor

I second the need to support CUDA in docker. Our researchers need to accelerate machine learning code written e.g. in tensorflow using CUDA, or run scientific simulations that use the GPU. It would be great if CUDA and GPU passthrough could be supported in docker and in WSL 2. Currently we are forced to used linux because of this, but would prefer to use Windows Server, if GPU passthrough becomes possible.

Copper Contributor

Thanks for the great work!
This is really exciting!

Copper Contributor

@Craig Wilhite, @rickman_MSFT May I ask a question?
Is it possible to launch such GPU accelerated Windows containers via Kubernetes?

 

Kubernetes seems to launch containers already in process isolation mode, corresponding to the --isolation process option you are mentioning in the article. But what is the Kubernetes equivalent of the --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 option? We see no need for a Kubernetes device plugin, which seems to be the wrong way anyway, since it would assign GPU resources exclusively to a container.

 

We can successfully run our application in a GPU accelerated Windows container via docker run. We can also create Windows containers via Kubernetes, but without GPU access yet. Optimistically hoping that it is only a small thing we are missing.

Microsoft

@Thomasin said:


Is it possible to launch such GPU accelerated Windows containers via Kubernetes?

Unfortunately, Kubernetes does not yet support resource allocation and enablement of GPU acceleration for Windows containers. It's something we're actively looking into, however, as we know many container customers prefer to deploy their containers using Kubernetes.

 

Copper Contributor

@rickman_MSFT 

Thanks for your reply. This sounds very promising!

 

We just saw that this DirectX device plugin for Kubernetes seems to solve the problem, but requires the pending pull request for supporting device plugins under Windows. In contrary to my previous post, I now think that a device plugin is the right approach, and this particular implementation looks good to me. What do you think about it?


Maybe Microsoft can throw in its weight and support the acceptance of that pull request? ;)

Copper Contributor

>> invalid argument "class/5B45201D-F2F2-4F3B-85BB-30FF1F953599" for "--device" flag: class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 is not an absolute path

 

I am getting this same error running Docker v19.03.13 on Windows 10. I have enabled experimental features both in the "Experimental Features" section of the settings as well as enabling the "experimental" flag in the Docker engine. Not sure what I'm doing wrong here. Any tips?

Copper Contributor

it seems GPU is slower, what should be the reason?

(base) PS C:\Projects\Virtualization-Documentation\windows-container-samples\directx> docker run --isolation process --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 winml-runner

Created LearningModelDevice with CPU device

Created LearningModelDevice with GPU: Intel(R) UHD Graphics 630
Loading model (path = C:\App\tiny_yolov2\model.onnx)...
=================================================================
Name: Example Model
Author: OnnxMLTools
Version: 0
Domain: onnxconverter-common
Description: The Tiny YOLO network from the paper 'YOLO9000: Better, Faster, Stronger' (2016), arXiv:1612.08242
Path: C:\App\tiny_yolov2\model.onnx
Support FP16: false

Input Feature Info:
Name: image
Feature Kind: Image (Height: 416, Width:  416)

Output Feature Info:
Name: grid
Feature Kind: Float

=================================================================

Binding (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Evaluating (device = CPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Binding and Evaluating 999 more times...
Results (device = CPU, numIterations = 1000, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML):

First Iteration Performance (load, bind, session creation, and evaluate):
  Load: 86.8923 ms
  Bind: 0.1449 ms
  Session Creation: 40.6742 ms
  Evaluate: 34.9017 ms

  Working Set Memory usage (evaluate): 27.8672 MB
  Working Set Memory usage (load, bind, session creation, and evaluate): 155.789 MB
  Peak Working Set Memory Difference (load, bind, session creation, and evaluate): 187.664 MB

  Dedicated Memory usage (evaluate): 0 MB
  Dedicated Memory usage (load, bind, session creation, and evaluate): 0 MB

  Shared Memory usage (evaluate): 0 MB
  Shared Memory usage (load, bind, session creation, and evaluate): 0 MB

Average Performance excluding first iteration. Iterations 2 to 1000. (Iterations greater than 1 only bind and evaluate)
  Average Bind: 0.0815504 ms
  Average Evaluate: 26.6584 ms

  Average Working Set Memory usage (bind): 3.91016e-06 MB
  Average Working Set Memory usage (evaluate): 0.00594735 MB

  Average Dedicated Memory usage (bind): 0 MB
  Average Dedicated Memory usage (evaluate): 0 MB

  Average Shared Memory usage (bind): 0 MB
  Average Shared Memory usage (evaluate): 0 MB



Binding (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Evaluating (device = GPU, iteration = 1, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML)...[SUCCESS]
Binding and Evaluating 999 more times...
Results (device = GPU, numIterations = 1000, inputBinding = CPU, inputDataType = Tensor, deviceCreationLocation = WinML):

First Iteration Performance (load, bind, session creation, and evaluate):
  Load: 86.8923 ms
  Bind: 1.3861 ms
  Session Creation: 4486.82 ms
  Evaluate: 38.5659 ms

  Working Set Memory usage (evaluate): 65.3438 MB
  Working Set Memory usage (load, bind, session creation, and evaluate): 245.676 MB
  Peak Working Set Memory Difference (load, bind, session creation, and evaluate): 186.244 MB

  Dedicated Memory usage (evaluate): 0 MB
  Dedicated Memory usage (load, bind, session creation, and evaluate): 0 MB

  Shared Memory usage (evaluate): 0 MB
  Shared Memory usage (load, bind, session creation, and evaluate): 0 MB

Average Performance excluding first iteration. Iterations 2 to 1000. (Iterations greater than 1 only bind and evaluate)
  Average Bind: 0.260335 ms
  Average Evaluate: 27.0888 ms

  Average Working Set Memory usage (bind): 0 MB
  Average Working Set Memory usage (evaluate): -0.0607131 MB

  Average Dedicated Memory usage (bind): 0 MB
  Average Dedicated Memory usage (evaluate): 0 MB

  Average Shared Memory usage (bind): 0 MB
  Average Shared Memory usage (evaluate): 0 MB

 

 

Copper Contributor

@rickman_MSFT  ,  @Craig Wilhite 

What is the status for supporting video encode using Media Foundation Transforms ?

 

Copper Contributor

Is it possible to create a virtual DirectX output device for a container to enable the use of Desktop Duplication API?

Version history
Last update:
‎Apr 10 2019 08:11 AM
Updated by: