NVads A10 v5 series on Azure: Initial Benchmarks for Remote Workstation, Gaming, and Media Workloads
Published Jul 08 2022 11:23 AM 7,094 Views
Microsoft

Initial Benchmarks for Remote Workstation, Gaming, and Media and Entertainment Workloads

By: Gaurav Uppal, Technical Program Manager, Azure Specialized Compute Benchmarking

 

The NVads A10 v5 series is now publicly available on Azure. These virtual machines (VMs) are powered by NVIDIA A10 Tensor Core GPUs and AMD EPYC 74F3V (Milan) CPUs. They are optimized for GPU- accelerated virtual remote workstations and graphics applications, and available in a number of sizes, ranging from 1/6 of a single GPU (A10-4Q) to 2 full GPUs (2x A10-24Q).  You can find out further details on the product on our Microsoft Docs product page.

 

The purpose of this blog is to demonstrate initial NVads A10 v5 performance and benchmarking results to help you pick the right VM for your workload.

 

View the demo of what an end user would experience using a NV12ads A10 v5 VM (1/3 A10 GPU) running an open-source model in Revit 2023.

 

VDI Protocol: RDP, Azure Region location: SouthCentralUS, Testing location: Atlanta, GA, OS: Windows 11 Enterprise, Connection: Home wifi (200 Mbps) + VPN

Multi-threaded: Opening the project, saving the project. Single-threaded: All other actions (pivot, pan, etc).

 

SpecViewPerf

SpecViewPerf2020 1080p Benchmark Comparisons between the NVads A10 v5 and NVv3 series

Relevant for: Users of CAD, AEC and other graphics heavy applications across the board

 

SpecViewPerf is a global standard benchmark that tests the 3D graphics performance of systems running under OpenGL and DirectX by running a number of “viewsets”, each corresponding to a different workstation-level application that represents actual workloads for a variety of industries.

 

 

RachelPruitt_1-1657303384026.png

 

SpecViewPerf scores are the frame rate at which the GPU renders the scenes of a particular viewset. These tests were all done at 1080p, but we also have 4K results coming in the following weeks. 4K performance is expected to have a much better delta than 1080p between the two SKUs.

 

The NV12s v3 instance offers a full NVIDIA Tesla M60 GPU, while the NV18ads A10 v5 offers a ½ A10 Tensor Core GPU and the NV36ads A10 v5 offers a full A10 GPU.  The ½ A10 offering showed an average 1.21x performance improvement over the 1 M60 offering across the 8 tested applications. Comparing 1 M60 GPU with 1 A10 Tensor Core GPU, the performance increase of the A10 offering showed an average 2.48x performance improvement across the 8 tested applications.

 

SpecViewPerf20 runs on a maximum of 1 full GPU, so we tested it on our NVads A10 v5 series from 1/6th of a GPU to a full (standard memory) GPU and compared it to ideal performance scaling.

 

RachelPruitt_2-1657303521550.png

 

As Graph 1 shows, performance exceeded linear scaling for 6 of the 8 tested applications, with the other two scaling to ~5x performance instead of 6x performance. These results can help you determine which NVads A10 v5 size will give you the most efficient GPU/performance ratio depending on the applications that are relevant to your use case.

 

RachelPruitt_3-1657303559537.png

 

Graph 2 shows the % Performance of each application normalized for cost at each NVads A10 v5 size. This can help you select the most efficient VM for your workload in terms of cost to performance ratio.

 

VRay 5 and CineBench

VRay 5 Benchmark and CineBench R15 OpenGL Comparisons between the NVads A10 v5 series and NCas T4 v3-series

Tests: Rendering Abilities of a given system

Relevant for: Media and Entertainment (FX) users, Rendering across the board

 

Media/Entertainment

VDI Protocol: Teradici

Applications: CineBench R15, VRay 5 Benchmark GPU RTX

 

RachelPruitt_5-1657304234758.png

 

VRay GPU RTX tests the rendering ability of hardware. The results of VRay show that the size 36 offering for the NVads A10 v5 delivers about 2.07x the rendering capability of the size 16 offering on the NCas T4 v3. This benchmark is highly dependent on the CPU speed for rendering the frames.

 

3Dmark is a better benchmark for the ability to visualize the frames, so the A10 and Tensor Core GPUs will be better suited than the NVIDIA T4 Tensor Core GPU for workloads that require CPU rendering + GPU visualization, but the T4 is directly comparable to ½ of an A10 for pure rendering.

 

These tests were done through Teradici PCoIP, which is supported by the NVads A10 v5.

 

Under the hood

 

The NVads A10 v5 employs GPU-P, also known as virtual GPU, a technology that allows for a single GPU node to have multiple users on it at once. GPU-P is based on single-root I/O virtualization (SR-IOV) technology which allows sharing of I/O devices and allows for single root function to appear as multiple physical devices.  It makes use of virtual functions, which map hardware resources needed to each child partition. Then when the child partition is accessed, many times the virtual device driver is able to access the hardware directly, without having to communicate with the host.

 

With the NVads A10 v5, you can partition a single NVIDIA A10 Tensor Core GPU into as many as 6 virtual machines, each with separated predictable performance. Because of SR-IOV, each GPU partition can act as an individual machine that only has access to its own resources. Our team has also worked to improve predictability, reliability, and simplicity of using the NV-series on Azure. Please check the NVads A10 v5 product documentation for more details and stay tuned for further deep-dive benchmarking blogs for some of our major customer segments.

Co-Authors
Version history
Last update:
‎Oct 25 2022 12:54 PM
Updated by: