Authors: Amirreza Rastegari, Nihit Pokhrel, Nathan Baker, Jon Shelley
Introduction:
Nanoscale Molecular Dynamics (NAMD) is a molecular dynamics program designed for high-performance simulation of large biomolecular systems. The program scales well over many processors and nodes, making it an excellent fit for Azure High Performance Computing (HPC) infrastructure. Azure HB Series virtual machines (VMs) enable customers who run NAMD to accelerate their innovation by using the latest AMD CPU and NVIDIA Quantum InfiniBand offerings. These VMs, as shown in table 1, are designed to deliver leadership-class performance, scalability, and cost efficiency for various real-world HPC workloads at scale.
Table 1: HB-Series VM sizes benchmarked.
Performance Benchmarking and Simulation Models
The benchmarks we used to compare the HB-series VM offering were standard ApoA1 and STMV. The software stack we used to run the benchmarks was NAMD version 2.15 (Git-2022-07-21) and AlmaLinux HPC 8.6 Azure marketplace image. These benchmarks enabled us to compare the performance across the various VM sizes using increasing numbers of VMs in an objective way. The benchmark models are summarized in the table 2 below.
Table 2: Description of the two benchmark models.
Representative images of the protein systems in these benchmarks are shown in figure 1 below.
ApoA1 (water not shown) |
STMV (water not shown) |
|
|
Figure 1: Visual representation of the benchmark models used. Images generated using MOL* 3D viewer on RCSB PDB - 3D View.
Performance comparison across the HB series VMs
For each benchmark model, we scaled the simulations up to a maximum of 64 VMs (32 for the ApoA1 benchmark) and measured the performance in terms of nanoseconds per day (ns/day). Using this metric, we calculated the cost[1] per nanosecond (cost/ns) of the simulations to understand the relative cost per solution across 3-4 years of HB-series VM offerings.
Figure 2: Performance and cost comparison of HB, HBv2 and HBv3 series VMs on the ApoA1 (92k atoms) model.
N |
HB (60ppn, ns/day) |
HBv2 (96ppn, ns/day) |
HBv3 (96ppn, ns/day) |
1 |
5.1 |
11.4 |
11.4 |
2 |
8.5 |
20.2 |
20.4 |
4 |
12.8 |
38.1 |
38.6 |
8 |
29.4 |
66.7 |
67.3 |
16 |
37.9 |
102.6 |
109.6 |
32 |
40.1 |
120.7 |
134.5 |
Table 1: Performance of HB, HBv2 and HBv3 series VMs on the ApoA1 (92k atoms) model.
The ApoA1 (92k atoms) benchmark model consists of 92,224 atoms. With this model, the HBv3 VMs outperform the HB VMs by as much as 330%, while lowering the cost per nanosecond by up to 50%. The HBv2 VMs demonstrate almost identical performances to those of HBv3, on up to 8 VMs. However, beyond this point, the HBv2 VMs falls off, compared to the HBv3 VMs. The best performance was achieved using 32 HBv3 VMs, which delivered a performance of 134.5 ns/day, at a cost of 22.6 USD/ns. However, due to the small size of the problem, the parallel efficiency with 32 HBv3 VMs drops to 37%, since there are less than 30 atoms/core.
The reasonable cost and performance point is obtained with 4 HBv3 VMs. This is when there are ~250 atoms/core in the simulation, and the resulting parallel efficiency remains around 85%. At this point, the simulation's performance and cost per nanosecond are 38.6 ns/day and 9.8 USD/ns, respectively.
Figure 3: Performance and cost comparison of HB, HBv2 and HBv3-series VMs on the STMV (1M atoms) model.
N |
HB (60ppn, ns/day) |
HBv2 (96ppn, ns/day) |
HBv3 (96ppn, ns/day) |
1 |
0.49 |
1.13 |
1.18 |
2 |
0.90 |
1.91 |
1.97 |
4 |
1.55 |
4.31 |
4.55 |
8 |
3.81 |
8.49 |
8.96 |
16 |
6.97 |
15.95 |
16.87 |
32 |
10.11 |
27.18 |
29.65 |
64 |
18.24 |
31.94 |
42.05 |
Table 1: Performance of HB, HBv2 and HBv3 series VMs on the STMV (1M atoms) model.
In the STMV (1M atoms) benchmark model with one million atoms, the HBv3 VMs outperform the HB VMs by as much as 230%, while also lowering the cost per nanosecond by as much as 32%. The HBv2 VMs demonstrate similar performances to those of HBv3, on up to 4 VMs. Beyond this point, however, the HBv2 VMs exhibit lower performance compared to the HBv3 VMs.
The best performance on the STMV (1M atoms) benchmark model was achieved using 64 HBv3 VMs. At this point we see ~42 ns/day at a cost of ~145 USD/ns. However, with this many VMs the parallel efficiency drops significantly because of the dwindling number of atoms/core (~350) in the simulation.
Figure 4: Performance and cost comparison of HB, HBv2 and HBv3 series VMs on the STMV (20M atoms) model.
N |
HB (60ppn, ns/day) |
HBv2 (120ppn, ns/day) |
HBv3 (120ppn, ns/day) |
1 |
0.035 |
0.086 |
0.088 |
2 |
0.063 |
0.145 |
0.154 |
4 |
0.138 |
0.480 |
0.469 |
8 |
0.397 |
0.862 |
0.867 |
16 |
0.757 |
1.864 |
1.854 |
32 |
1.399 |
3.419 |
3.481 |
64 |
2.698 |
5.486 |
5.879 |
Table 1: Performance of HB, HBv2 and HBv3 series VMs on the STMV (20M atoms) model.
In the STMV (20M atoms) benchmark model, consisting of 20 million atoms, HBv3 VMs outperformed HB VMs by 218-251%, while also providing a cost per nanosecond of around 64-68%. Since this is a compute-intensive problem, HBv2 VMs deliver a similar performance to that of HBv3 VMs at small VM counts of up to 16 VMs, and a small drop at the larger VM counts. The best performance with this model was achieved with 64 HBv3 VMs, which delivered a performance of 5.879 ns/day and a cost of ~1035 USD/ns. For this benchmark we scaled the model to just over 3200 atoms/core at 64 HBv3 VMs. Over the range from 1-64 VMs we saw a parallel efficiency of~105-135%, demonstrating super linear speedup.
Figure 5: Performance and cost comparison of HB, HBv2 and HBv3 series VMs on the STMV (210M atoms) model.
N |
HB (60ppn, ns/day) |
HBv2 (120ppn, ns/day) |
HBv3 (120ppn, ns/day) |
16 |
0.061 |
0.161 |
0.157 |
32 |
0.119 |
0.299 |
0.299 |
64 |
0.241 |
0.654 |
0.649 |
Table 1: Performance of HB, HBv2 and HBv3 series VMs on the STMV (210M atoms) model.
In the STMV (210M atoms) benchmark model, which consists of 210 million atoms, the HBv3 VMs outperform the HB VMs by 257-270%, at only ~60% of the cost. This benchmark is very compute-intensive, so the HBv2 VMs delivers similar performance to the HBv3 VMs. The highest performance with this model was achieved using 64 HBv3 VMs, which delivered a performance of 0.649 ns/day at a cost of ~9360 USD/ns. Because of the size of this benchmark, with more than 27300 atoms per core at 64 HBv3 VMs, the simulation's parallel efficiency remains impressive, reaching as high as 128%, demonstrating super linear speedup.
Continuous innovation with leading-edge capabilities
With Azure’s HB-series VMs, NAMD customers can reduce the time and cost of their simulations. When compared to 3–4-year-old technology (HB-series VMs), the various benchmarks ranging from 1 to 64 VMs (60 – 7680 cores), HBv3-series VMs provide the fastest time-to-solution at the lowest relative cost per NAMD simulation. These performance gains on the HBv3 are due to additional cores and the 200Gb/s HDR InfiniBand. The benchmarks show HBv3-series VMs deliver up to 3.1 times higher performances while reducing the $/nanosecond of the simulation by as much as 1.9 times compared to HB-series VMs.
Additional Information
- Learn more about Azure HPC
- Azure HBv3-series VMs.
- Azure WOC-Benchmarking GitHub Repository
- NAMD compilation recipe on Azure HPC GitHub Repository
- Azure HPC Content Hub
* NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign.
[1] Learn more about the pricing of each HB-series VM.