hpc
250 TopicsAzure HPC OnDemand Platform: Cloud HPC made easy.
As many customers are looking at running their HPC workloads in the cloud, onboarding effort and cost are key consideration. As an HPC administrator, in such process you try to provide a unified user experience with a minimal disruption, in which the end users and the cluster administrators can retrieve most of their on-premises environment while leveraging the power of running in the cloud. The Specialized Workloads for Industry and Mission team that works on some of the most complex HPC customer and partner scenarios has built a solution accelerator Azure HPC OnDemand Platform (aka az-hop) available in the Azure/az-hop public GitHub repository to help our HPC customers onboard faster. az-hop delivers a complete HPC cluster solution ready for users to run applications, which is easy to deploy and manage for HPC administrators. az-hop leverages the various Azure building blocks and can be used as-is, or easily customized and extended to meet any uncovered requirements.HPC Performance and Scalability Results with Azure HBv2 VMs
(Article contributed by Jon Shelly and Evan Burness, Azure) Just in time for SC’19, Azure launched into Preview this week the new HBv2 Virtual Machines for High-Performance Computing (HPC). These VMs feature a wealth of new technology, including: AMD EPYC 7742 CPUs (Rome) 2.45 GHz Base clock / 3.3 GHz Boost clock 480 MB L3 cache, 480 GB RAM 340 GB/s of Memory Bandwidth 200 Gbps HDR InfiniBand (SRIOV) with Adaptive Routing 900 GB SSD (NVMeDirect) Below are initial performance characterizations using a variety of configurations on both microbenchmarks as well as commonly used HPC applications for which the HB family of VMs is optimized for. Microbenchmarks MPI Latency OSU Benchmarks (5.6.2) – osu_latency with MPI = HPC-X, Intel MPI, MVAPICH2, OpenMPI Message Size (bytes) HPC-X Intel MPI MVAPICH2 OpenMPI 0 1.62 1.95 1.85 1.61 1 1.62 1.95 1.9 1.61 2 1.61 1.95 1.9 1.61 4 1.62 1.95 1.9 1.61 8 1.61 1.96 1.9 1.61 16 1.62 1.96 1.93 1.62 32 1.77 1.97 1.94 1.77 64 1.83 2.03 2.08 1.82 128 1.9 2.09 2.29 1.9 256 2.44 2.65 2.78 2.44 512 2.53 2.71 2.84 2.53 1024 2.63 2.84 2.93 2.62 2048 2.92 3.09 3.13 2.92 4096 3.72 3.89 4.07 3.74 MPI Bandwidth (2 QP’s) OSU Benchmarks (5.6.2) – osu_mbw_mr with ppn = 2 with MPI = HPC-X #bytes BW peak BW average 4096 15920.65 15902.67 8192 23045.57 23036.88 16384 23270.14 23260.04 32768 23376.91 23372.9 65536 23423.49 23423.23 131072 23445.05 23443.6 262144 23463.94 23463.93 524288 23470.7 23470.55 1048576 23474.3 23474.08 2097152 23475.77 23475.73 4194304 23476.61 23476.61 8388608 23477.06 23477.05 Application Benchmarks App: Siemens Star-CCM+ Version: 14.06.004 Model: LeMans 100M Coupled Solver Configuration Details: 116 MPI ranks were run (4 ranks from each of 29 NUMA) in each HBv2 VM in order to leave nominal resources to run Linux background processes. In addition, Adaptive Routing was enabled and DCT (Dynamic Connected Transport) was used as the transport layer, while HPC-X version 2.50 was used for MPI. Azure CentOS HPC 7.6 image was used from https://github.com/Azure/azhpc-images VMs Cores PPN SETime SpeedUp ParallelEff 1 116 116 258.92 116 100 2 232 116 129.56 231.82 99.9 4 464 116 62.01 484.35 104.4 16 1856 116 16.46 1824.71 98.3 32 3712 116 8.4 3575.56 96.3 64 7424 116 4.8 6257.23 84.3 128 14848 116 2.5 12013.89 80.9 Summary: Star-CCM+ was scaled at 81% efficiency to nearly 15,000 MPI ranks delivering an application speedup of more than 12,000x. This compares favorably to Azure’s previous best of more than 11,500 MPI ranks, which itself was a world-record for MPI scalability on the public cloud. App: ANSYS Fluent Version: 14.06.004 Model: External Flow over a Formula-1 Race Car (f1_racecar_140m) Configuration Details: 60 MPI ranks were run (2 out of 4 cores per NUMA) in each HBv2 VM in order to leave nominal resources to run Linux background processes and give ~6 GB/s of memory bandwidth per core. In addition, Adaptive Routing was enabled and DCT (Dynamic Connected Transport) was used as the transport layer, while HPC-X version 2.50 was used for MPI. Azure CentOS HPC 7.6 image was used from https://github.com/Azure/azhpc-images VMs HBv2 Solver Rating HBv2 Speedup Linear Ideal Speedup 1 68.5 1 1 2 134.5 1.96 2 4 275.9 4.03 4 8 557.8 8.14 8 16 1122.1 16.38 16 32 2385.1 34.82 32 64 4601.9 67.18 64 128 9846.2 143.74 128 Summary: HBv2 VMs scale super linearly (112%) up to the top end measured number of VMs (128). The Fluent Solver Rating measured at this top-end level of scale is 83% more performance than the current leader submission on ANSYS public database for this model (https://bit.ly/2OdAExM). Impact of Adaptive Routing App: Siemens Star-CCM+ Version: 14.06.004 Model: LeMans 100M Coupled Solver Configuration Details: Star-CCM+ performance was compared on an “apples to apples” basis, with the only variable as Adaptive Routing being disabled and then enabled. Summary: Adaptive Routing, designed to drive higher sustained application scalability for large MPI jobs, delivered a scaling efficiency improvement of 17% over an identical job run with the feature disabled. This translates to faster time to solution, and more efficient use of application licenses.Introducing New Performance Tiers for Azure Managed Lustre: Enhancing HPC Workloads
Building upon the success of its General Availability (GA) launch last month, we’re excited to unveil two new performance tiers for Azure Managed Lustre (AMLFS): 40MB/s per TiB and 500MB/s per TiB. This blog post explores the specifics of these new tiers and how they embody a customer-centric approach to innovation.Ramp up with me...on HPC: What is high-performance computing (HPC)?
Over the next several months, let’s take a journey together and learn about the different use cases. Join me as I dive into each use case and for some of them, I’ll even try my hand at the workload for the first time. We’ll talk about what went well, and any what issues I ran into. And maybe, you’ll get to hear a little about our customers and partners along the way.Run WRF v4 on Azure HPC Virtual Machines
The weather research and forecasting (WRF) model is popular in high performance computing (HPC) code used by the weather and climate community. WRF v4 typically performs well on traditional HPC architectures that support high floating-point processing, high memory bandwidth, and a low-latency network—for example, top500 supercomputers. Now you can find all these characteristics on the new HBv2 Azure Virtual Machines (VMs) for HPC. With some minor tuning, large WRF v4 models perform very well on Azure.