Authors: Amirreza Rastegari, Jon Shelley
Simcenter STAR-CCM+ is one of the most widely used computational fluid dynamics (CFD) simulation applications by high performance computing (HPC) customers in a variety of research and commercial fields. Over the past 5 years, our partnership has been focused on delivering great CFD performance in Azure. Recently, we evaluated Simcenter STAR-CCM+ across our HB series virtual machines (VMs) to see how the HBv3 VMs compares to the previous generations in terms of cost[1], performance, and scalability.
The Azure HB series currently includes HBv3, HBv2 and HB VMs, with their technical features summarized in table 1:
Table 1: Summary of VM specs for the HB series VMs.
A total of 3 benchmarks were run using Simcenter STAR-CCM+. The benchmarks selected cover small, medium, and large scale CFD model sizes, details of which are listed in table 2:
Table 2: Benchmarks used for the comparisons.
For the software stack we used Simcenter STAR-CCM+ 2021.3, CentOS 8.1 HPC Azure marketplace image (maintained on GitHub), and HPC-X 2.8.3 MPI library.
How we calculated performance and cost
To calculate the performance, we ran each benchmark 3 times at each VM count and averaged the resulting “averaged elapsed times”, obtained from the three runs. Next, we took the average elapsed time for the single HB VM and divided it by the averages obtained for the other VM counts and VM types.
To calculate the relative costs, we took the average elapsed times calculated in the performance step and then converted the times from seconds to hours. We then multiplied the times by the pay-as-you-go hourly pricing. Once we had the costs, we took that cost and divided it by the cost for a single HB VM to determine the relative cost. Our calculations did not include the Simcenter STAR-CCM+ licensing costs. However, if you are using the Power on Demand licensing scheme then you are paying for a job and not the number of cores used.
Benchmark Results
Benchmark (< 10 million cells)
The Reactor benchmark represents a reactive flow simulation in a reactor chamber using the segregated solvers, with a mesh of approximately 9.1M cells. Using the “Reactor” benchmarks we evaluated the cost, performance, and scale for small workloads.
Figure 1: Relative performance and cost of the simulations for the "Reactor" benchmark using Simcenter STAR-CCM+ on Azure’s HB series VMs.
As can be seen in Figure 1, the HB series VMs demonstrate excellent parallel efficiency and cost performances up to 8 VMs. At 4 VMs we see the largest performance differential. At this point there are ~2.3M elements per VM. The HBv3 VM, with 1.5GB of L3 3D V-CacheTM[2] (3x more L3 cache than HBv2 and 6x more than HB), delivered ~3.3x performance improvement compared to HB VMs, launched only 4 years prior.
Medium Benchmark (15 – 20 million cells)
The LeMans Poly 17M benchmark represents external flow simulation around a race car using segregated solvers, with a mesh of approximately 17M cells. For this benchmark, we scaled up to 32 VMs. At the 32 VM point, each HBv2 and HBv3 VM simulates ~0.5 million cells, which equates to ~7000 cells/core. HB VMs simulated twice that amount (~1M cells per VM, ~14,000 cells/core). The results are shown below in figure 2.
Figure 2: Relative performance and cost of the simulations for the “LeMans Poly 17M” benchmark using Simcenter STAR-CCM+ on Azure’s HB series VMs.
For the LeMans Poly 17M benchmark the HBv3 VMs delivered the lowest cost per job at 8 VMs. For 8 VMs, there are ~2.1 million grid cells per VM. Here, the HBv3 delivered ~3.4x performance improvement compared to HBv1 VMs with a speedup of ~10.4 compared to a linear speedup of 8.
Large Benchmarks (~100 million cells)
The LeMans 100M Coupled benchmark is an extension of the LeMans Poly 17M benchmark with a much finer grid. It represents an external flow simulation around a race car using the coupled solvers, with a mesh of approximately 106M cells.
Figure 3: Relative performance and cost of the simulations for the "LeMans 100M Coupled" benchmark using Simcenter STAR-CCM+ on Azure’s HB series VMs.
For the LeMans 100M Coupled benchmark the HBv3 VMs demonstrate excellent scalability and efficiency at all VM counts, up to 64 VMs, as seen in figures 3. The best scalability and lowest costs per job were obtained at the 32 and 64 VM mark. There are ~1.6 million cells per VM at 64 VMs and 3.3 million cells per VM at 32 VMs. Here, the HBv3 delivered ~2.8x performance improvement compared to HBv1 VMs with a speedup of ~38 compared to a linear speedup of 32.
Summary and Conclusions
These benchmarking studies showcase Azure’s commitment to continuously improve the computational performance, scalability, and cost effectiveness of its HPC offerings, over 3 generations of hardware. These improvements enable our customers to cut their overall costs and time per solution. When compared to the 3–5-year life span of an on-premises cluster, the yearly HB series improvements lead to significant cost savings, faster time to solutions, and shorter time to market.
Compared to the 3–4-year-old HB hardware, HBv3 delivers performance gains which help engineers and researchers to be more productive and to cut the VM cost per solution by 40-50%. The 1.5GB of L3 3D V-CacheTM on HBv3 provides significant performance gains for CFD workloads. From the Simcenter STAR-CCM+ benchmarking results, reported above, we see that an optimal performance is achieved at around the 2-3 million elements per VM range.
We invite you to learn more about the Azure HB series to see how it can help your business meet the challenges of tomorrow.
Additional Information:
- Azure High-Performance Computing
- Azure HPC documentation | Microsoft Learn
- HBv3-series - Azure Virtual Machines | Microsoft Learn
- HBv2-series - Azure Virtual Machines | Microsoft Learn
- HB-series - Azure Virtual Machines | Microsoft Learn
- Simcenter STAR-CCM+ | Siemens Software
#AzureHPCAI #AzureHPC
[1] Cost comparisons were constructed using publicly available pay-as-you-go pricing in the East US region.
[2] https://www.amd.com/en/technologies/3d-v-cache