Initial Benchmarks for Remote Workstation, Gaming, and Media and Entertainment Workloads
By: Gaurav Uppal, Technical Program Manager, Azure Specialized Compute Benchmarking
The NVads A10 v5 series is now publicly available on Azure. These virtual machines (VMs) are powered by NVIDIA A10 Tensor Core GPUs and AMD EPYC 74F3V (Milan) CPUs. They are optimized for GPU- accelerated virtual remote workstations and graphics applications, and available in a number of sizes, ranging from 1/6 of a single GPU (A10-4Q) to 2 full GPUs (2x A10-24Q). You can find out further details on the product on our Microsoft Docs product page.
The purpose of this blog is to demonstrate initial NVads A10 v5 performance and benchmarking results to help you pick the right VM for your workload.
View the demo of what an end user would experience using a NV12ads A10 v5 VM (1/3 A10 GPU) running an open-source model in Revit 2023.
VDI Protocol: RDP, Azure Region location: SouthCentralUS, Testing location: Atlanta, GA, OS: Windows 11 Enterprise, Connection: Home wifi (200 Mbps) + VPN
Multi-threaded: Opening the project, saving the project. Single-threaded: All other actions (pivot, pan, etc).
SpecViewPerf2020 1080p Benchmark Comparisons between the NVads A10 v5 and NVv3 series
Relevant for: Users of CAD, AEC and other graphics heavy applications across the board
SpecViewPerf is a global standard benchmark that tests the 3D graphics performance of systems running under OpenGL and DirectX by running a number of “viewsets”, each corresponding to a different workstation-level application that represents actual workloads for a variety of industries.
SpecViewPerf scores are the frame rate at which the GPU renders the scenes of a particular viewset. These tests were all done at 1080p, but we also have 4K results coming in the following weeks. 4K performance is expected to have a much better delta than 1080p between the two SKUs.
The NV12s v3 instance offers a full NVIDIA Tesla M60 GPU, while the NV18ads A10 v5 offers a ½ A10 Tensor Core GPU and the NV36ads A10 v5 offers a full A10 GPU. The ½ A10 offering showed an average 1.21x performance improvement over the 1 M60 offering across the 8 tested applications. Comparing 1 M60 GPU with 1 A10 Tensor Core GPU, the performance increase of the A10 offering showed an average 2.48x performance improvement across the 8 tested applications.
SpecViewPerf20 runs on a maximum of 1 full GPU, so we tested it on our NVads A10 v5 series from 1/6th of a GPU to a full (standard memory) GPU and compared it to ideal performance scaling.
As Graph 1 shows, performance exceeded linear scaling for 6 of the 8 tested applications, with the other two scaling to ~5x performance instead of 6x performance. These results can help you determine which NVads A10 v5 size will give you the most efficient GPU/performance ratio depending on the applications that are relevant to your use case.
Graph 2 shows the % Performance of each application normalized for cost at each NVads A10 v5 size. This can help you select the most efficient VM for your workload in terms of cost to performance ratio.
VRay 5 Benchmark and CineBench R15 OpenGL Comparisons between the NVads A10 v5 series and NCas T4 v3-series
Tests: Rendering Abilities of a given system
Relevant for: Media and Entertainment (FX) users, Rendering across the board
Media/Entertainment
VDI Protocol: Teradici
Applications: CineBench R15, VRay 5 Benchmark GPU RTX
VRay GPU RTX tests the rendering ability of hardware. The results of VRay show that the size 36 offering for the NVads A10 v5 delivers about 2.07x the rendering capability of the size 16 offering on the NCas T4 v3. This benchmark is highly dependent on the CPU speed for rendering the frames.
3Dmark is a better benchmark for the ability to visualize the frames, so the A10 and Tensor Core GPUs will be better suited than the NVIDIA T4 Tensor Core GPU for workloads that require CPU rendering + GPU visualization, but the T4 is directly comparable to ½ of an A10 for pure rendering.
These tests were done through Teradici PCoIP, which is supported by the NVads A10 v5.
The NVads A10 v5 employs GPU-P, also known as virtual GPU, a technology that allows for a single GPU node to have multiple users on it at once. GPU-P is based on single-root I/O virtualization (SR-IOV) technology which allows sharing of I/O devices and allows for single root function to appear as multiple physical devices. It makes use of virtual functions, which map hardware resources needed to each child partition. Then when the child partition is accessed, many times the virtual device driver is able to access the hardware directly, without having to communicate with the host.
With the NVads A10 v5, you can partition a single NVIDIA A10 Tensor Core GPU into as many as 6 virtual machines, each with separated predictable performance. Because of SR-IOV, each GPU partition can act as an individual machine that only has access to its own resources. Our team has also worked to improve predictability, reliability, and simplicity of using the NV-series on Azure. Please check the NVads A10 v5 product documentation for more details and stay tuned for further deep-dive benchmarking blogs for some of our major customer segments.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.