Home
%3CLINGO-SUB%20id%3D%22lingo-sub-708197%22%20slang%3D%22en-US%22%3EPetascale%20Computing%20on%20Azure%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-708197%22%20slang%3D%22en-US%22%3E%3CP%3EThis%20is%20an%20exciting%20week%20with%20the%20International%20Supercomputing%20Conference%20even%20though%20I%20am%20not%20attending.%20But%2C%20to%20get%20in%20the%20spirit%2C%20and%2C%20after%20seeing%20this%20article%20about%20how%20%3CA%20href%3D%22https%3A%2F%2Fwww.top500.org%2Fnews%2Ftop500-becomes-a-petaflop-club-for-supercomputers%2F%22%20target%3D%22_blank%22%20rel%3D%22noopener%20nofollow%20noopener%20noreferrer%20noopener%20noreferrer%22%3Ethe%20entire%20list%20is%20now%20faster%20than%201%20petaflop%3C%2FA%3E%2C%20I%20did%20wonder%20whether%20I%20could%20make%20a%20cluster%20on%20Azure%20worthy%20of%20the%20%22Petaflop%20Club%22.%20As%20it%20happened%20I%20had%20a%20few%20nodes%20lying%20around%20after%20doing%20some%20large%20scale%20CP2K%20runs.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EAzure%20has%20recently%20%3CA%20href%3D%22https%3A%2F%2Fazure.microsoft.com%2Fen-in%2Fblog%2Fintroducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc%2F%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noopener%20noreferrer%20noopener%20noreferrer%22%3Elaunched%3C%2FA%3E%20two%20new%20types%20of%20VMs%20suitable%20for%20HPC.%20For%20this%20experiment%20the%20Hc%20node%20was%20used%20which%20comprises%20of%20dual%20socket%20Intel%20Xeon%20Platinum%208168%20nodes%20connected%20with%20100%20Gb%2Fsec%20EDR%20InfiniBand%20from%20Mellanox.%20In%20order%20to%20meet%20the%20petaflop%20challenge%20a%20cluster%20containing%20512%20nodes%20was%20used.%20By%20my%20calculations%20this%20should%20be%20a%20peak%20of%201.369%20PFlop%2Fs%3A%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CPRE%3ERpeak%20(GFlops%2Fs)%0A%20%20%20%20%3D%20%26lt%3Bfrequency%26gt%3B%20*%20%26lt%3Bcores-per-node%26gt%3B%20*%20%26lt%3Bnodes%26gt%3B%20*%20%26lt%3Bflops-per-cycle%26gt%3B%0A%20%20%20%20%3D%201.9%20*%2044%20*%20512%20*%2032%0A%20%20%20%20%3D%201369702.4%0A%3C%2FPRE%3E%0A%3CP%3ENotes%3A%3C%2FP%3E%0A%3CUL%3E%0A%3CLI%3EUsing%20base%20AVX512%20base%20frequency%20of%201.9%20-%20see%20%3CA%20href%3D%22https%3A%2F%2Fwww.intel.com%2Fcontent%2Fdam%2Fwww%2Fpublic%2Fus%2Fen%2Fdocuments%2Fspecification-updates%2Fxeon-scalable-spec-update.pdf%22%20target%3D%22_blank%22%20rel%3D%22noopener%20nofollow%20noopener%20noreferrer%20noopener%20noreferrer%22%3Ehere%3C%2FA%3E%20(although%20it%20is%20probably%20slightly%20higher%20as%20turbo%20is%20enabled)%3C%2FLI%3E%0A%3CLI%3EThe%20Azure%20VM%20has%2044%20cores%20exposed%20per%20node%3C%2FLI%3E%0A%3CLI%3EAVX%20512%20has%2032%20double%20precision%20flops%20per%20cycle%3C%2FLI%3E%0A%3C%2FUL%3E%0A%3CP%3EThe%20entire%20cluster%20uses%20the%20%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2FAzure-Compute%2FCentOS-HPC-VM-Image-for-SR-IOV-enabled-Azure-HPC-VMs%2Fba-p%2F665557%22%20target%3D%22_self%22%3ECentOS%207.6%20HPC%3C%2FA%3E%20Azure%20market%20place%20image%20with%20the%20only%20addition%20being%20the%20%22%3CA%20href%3D%22https%3A%2F%2Fsoftware.intel.com%2Fen-us%2Farticles%2Finstalling-intel-free-libs-and-python-yum-repo%22%20target%3D%22_blank%22%20rel%3D%22noopener%20nofollow%20noopener%20noreferrer%20noopener%20noreferrer%22%3Eintel-mkl-2019%3C%2FA%3E%22%20package%20where%20Linpack%20was%20taken%20from.%26nbsp%3B%20%26nbsp%3BLinpack%20was%20run%20with%20Intel%20MPI%202018%20that%20is%20included%20in%20the%20image.%20This%20was%20only%20ever%20going%20to%20be%20a%20quick%20test%20and%20so%20I%20chose%2032GB%20of%20RAM%20to%20be%20used%20per%20node%20and%20a%20problem%20size%20of%201%2C482%2C240%20and%20it%20was%20run%20with%20two%20ranks%20per%20node.%20Here%20are%20the%20results%3A%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CPRE%3E%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%0AT%2FV%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20N%20%20%20%20NB%20%20%20%20%20P%20%20%20%20%20Q%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20Time%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20Gflops%0A--------------------------------------------------------------------------------%0AWC00C2R2%20%20%20%20%201482240%20%20%20384%20%20%20%2032%20%20%20%2032%20%20%20%20%20%20%20%20%20%20%20%201816.51%20%20%20%20%20%20%20%20%20%20%20%20%3CSTRONG%3E1.19516e%2B06%3C%2FSTRONG%3E%0AHPL_pdgesv()%20start%20time%20Wed%20Jun%2019%2013%3A32%3A59%202019%0A%0AHPL_pdgesv()%20end%20time%20%20%20Wed%20Jun%2019%2014%3A03%3A16%202019%0A%0A--------------------------------------------------------------------------------%0A%7C%7CAx-b%7C%7C_oo%2F(eps*(%7C%7CA%7C%7C_oo*%7C%7Cx%7C%7C_oo%2B%7C%7Cb%7C%7C_oo)*N)%3D%20%20%20%20%20%20%20%200.0010026%20......%20PASSED%0A%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%0A%3C%2FPRE%3E%0A%3CP%3EThis%20shows%20Azure%20is%20definitely%20a%20strong%20contender%20for%20the%20%22Petaflop%20Club%22%20%3A)%3C%2Fimg%3E%20In%20fact%2C%20the%20score%20of%201.195%20PFlop%2Fs%20ranks%20it%20in%20368th%20place%20with%20the%20%3CA%20href%3D%22https%3A%2F%2Fwww.top500.org%2Flist%2F2019%2F06%2F%3Fpage%3D4%22%20target%3D%22_blank%22%20rel%3D%22noopener%20nofollow%20noopener%20noreferrer%20noopener%20noreferrer%22%3Elatest%20list%3C%2FA%3E.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-TEASER%20id%3D%22lingo-teaser-708197%22%20slang%3D%22en-US%22%3E%3CP%3EAn%20experiment%20running%20Linpack%20on%20the%20new%20HPC%20nodes%20to%20achieve%20petascale%20performance.%3C%2FP%3E%3C%2FLINGO-TEASER%3E
Microsoft

This is an exciting week with the International Supercomputing Conference even though I am not attending. But, to get in the spirit, and, after seeing this article about how the entire list is now faster than 1 petaflop, I did wonder whether I could make a cluster on Azure worthy of the "Petaflop Club". As it happened I had a few nodes lying around after doing some large scale CP2K runs.

 

Azure has recently launched two new types of VMs suitable for HPC. For this experiment the Hc node was used which comprises of dual socket Intel Xeon Platinum 8168 nodes connected with 100 Gb/sec EDR InfiniBand from Mellanox. In order to meet the petaflop challenge a cluster containing 512 nodes was used. By my calculations this should be a peak of 1.369 PFlop/s:

 

Rpeak (GFlops/s)
    = <frequency> * <cores-per-node> * <nodes> * <flops-per-cycle>
    = 1.9 * 44 * 512 * 32
    = 1369702.4

Notes:

  • Using base AVX512 base frequency of 1.9 - see here (although it is probably slightly higher as turbo is enabled)
  • The Azure VM has 44 cores exposed per node
  • AVX 512 has 32 double precision flops per cycle

The entire cluster uses the CentOS 7.6 HPC Azure market place image with the only addition being the "intel-mkl-2019" package where Linpack was taken from.   Linpack was run with Intel MPI 2018 that is included in the image. This was only ever going to be a quick test and so I chose 32GB of RAM to be used per node and a problem size of 1,482,240 and it was run with two ranks per node. Here are the results:

 

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WC00C2R2     1482240   384    32    32            1816.51            1.19516e+06
HPL_pdgesv() start time Wed Jun 19 13:32:59 2019

HPL_pdgesv() end time   Wed Jun 19 14:03:16 2019

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0010026 ...... PASSED
================================================================================

This shows Azure is definitely a strong contender for the "Petaflop Club" :) In fact, the score of 1.195 PFlop/s ranks it in 368th place with the latest list.