%3CLINGO-SUB%20id%3D%22lingo-sub-2074591%22%20slang%3D%22en-US%22%3ERe%3A%20Accelerated%20Networking%20on%20HB%2C%20HC%20and%20HBv2%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2074591%22%20slang%3D%22en-US%22%3E%3CP%3EHi%20Jithin%2C%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThanks%20for%20the%20clarification!%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIs%20the%20IB%20device%20naming%20consistent%20across%20all%20VMS%20in%20a%20scale-sets%3F%26nbsp%3B%20Or%20is%20it%20possible%20to%20get%20different%20IB%20device%20names%20in%20different%20VMs%20%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3Ethanks%3C%2FP%3E%3CP%3EMichael%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2074651%22%20slang%3D%22en-US%22%3ERe%3A%20Accelerated%20Networking%20on%20HB%2C%20HC%20and%20HBv2%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2074651%22%20slang%3D%22en-US%22%3E%3CP%3EHi%20Jithin%2C%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EDoes%20the%20hypervisor%20now%20identify%20correctly%20the%20CPU%20set%20that%20is%20associated%20with%20the%20physical%20proximity%20of%20the%20HCAs%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThat%20is%2C%20if%26nbsp%3BIPOI%20is%20the%20IBoIB%20if%20name%2C%20if%20I%20do%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CBLOCKQUOTE%3E%3CP%3E%24%26nbsp%3Bcat%20%2Fsys%2Fclass%2Fnet%2F%24%7BIPOIB%7D%2Fdevice%2Flocal_cpulist%26nbsp%3B%2Fsys%2Fclass%2Fnet%2F%24%7BIPOIB%7D%2Fdevice%2Flocal_cpus%3C%2FP%3E%3C%2FBLOCKQUOTE%3E%3CP%3E%3CSPAN%3Ewill%20I%20get%20the%20CPUs%20closer%20to%20the%20device%3F%26nbsp%3B%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSPAN%3Ethanks%26nbsp%3B%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2074672%22%20slang%3D%22en-US%22%3ERe%3A%20Accelerated%20Networking%20on%20HB%2C%20HC%20and%20HBv2%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2074672%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F934743%22%20target%3D%22_blank%22%3E%40drMikeT%3C%2FA%3E%26nbsp%3B%2C%20Devices%20in%20different%20VMs%20may%20get%20different%20names%20as%20it%20is%20just%20the%20translation%20of%20PCI%20ID.%20We%20are%20trying%20to%20make%20PCI%20ID%20unique%20so%20that%20it%20will%20be%20consistent%20across%20VMs.%20Until%20this%20is%20done%2C%20our%20recommendation%20is%20to%20disable%20AN%20during%20deployments.%3CBR%20%2F%3EI%20will%20update%20this%20blog%20as%20we%20have%20more%20updates%20on%20this%20front.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E-Jithin%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2074678%22%20slang%3D%22en-US%22%3ERe%3A%20Accelerated%20Networking%20on%20HB%2C%20HC%20and%20HBv2%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2074678%22%20slang%3D%22en-US%22%3E%3CP%3EWhen%20IB%20names%20are%20different%2C%20can%20UCX%20correctly%20identify%20and%20use%20the%20correct%20IB%20i%2Ff%20or%20should%20we%20specify%20it%20at%20the%20command%20line%20with%20a%20MPMD%20style%20mpirun%20%3F%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2075451%22%20slang%3D%22en-US%22%3ERe%3A%20Accelerated%20Networking%20on%20HB%2C%20HC%20and%20HBv2%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2075451%22%20slang%3D%22en-US%22%3E%3CP%3EWe%20are%20working%20on%20a%20solution%20that%20will%20avoid%20the%20need%20for%20MPMD%20style%20mpirun.%20With%20this%20solution%2C%20the%20IB%20interface%20name%20will%20be%20consistent%20across%20all%20VMs.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2075726%22%20slang%3D%22en-US%22%3ERe%3A%20Accelerated%20Networking%20on%20HB%2C%20HC%20and%20HBv2%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2075726%22%20slang%3D%22en-US%22%3E%3CP%3EIn%20the%20interim%2C%20can%20we%20get%20a%20script%20(or%20any%20other%20way)%20to%20identify%20the%20correct%20IB%20(or%20eth)%20IP%20device%20name%20per%20node%20%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EDo%20U%20know%20if%20Azure%20will%20ever%20offer%20VMs%20with%20multiple%20phys%20IB%20interfaces%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EWhen%20we%20see%20IPoIB%20names%20such%20as%20%3CSTRONG%3Eib0%3C%2FSTRONG%3E%26nbsp%3Bdoes%20this%20imply%20that%20AN%20has%20not%20yet%20been%20enabled%20%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThanks!%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2096096%22%20slang%3D%22en-US%22%3ERe%3A%20Accelerated%20Networking%20on%20HB%2C%20HC%2C%20HBv2%20and%20NDv2%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2096096%22%20slang%3D%22en-US%22%3E%3CP%3EJithin%20%3CBR%20%2F%3Ea%20Customer%20mentioned%20%2C%20they%20activated%20Acceleration%20Network%20and%20started%20VMSS%20and%20then%20deactivated%20AN%2C%20after%20that%26nbsp%3B%20they%20can%20not%20add%2Fremove%20VM%20on%20the%20same%20VMSS.%26nbsp%3B%20do%20you%20have%20any%20comment%20%3F%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EThanks!%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2067965%22%20slang%3D%22en-US%22%3EAccelerated%20Networking%20on%20HB%2C%20HC%2C%20HBv2%20and%20NDv2%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2067965%22%20slang%3D%22en-US%22%3E%3CP%20class%3D%22lia-align-justify%22%3E%3CA%20href%3D%22https%3A%2F%2Fazure.microsoft.com%2Fen-us%2Fblog%2Fmaximize-your-vm-s-performance-with-accelerated-networking-now-generally-available-for-both-windows-and-linux%2F%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3EAzure%20Accelerated%20Networking%3C%2FA%3E%26nbsp%3Bis%20now%20%3CSPAN%3Eavailable%20on%20the%20RDMA%20over%20InfiniBand%20capable%20and%20SR-IOV%20enabled%20VM%20sizes%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fvirtual-machines%2Fhb-series%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%20data-linktype%3D%22relative-path%22%3EHB%3C%2FA%3E%3CSPAN%3E%2C%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fvirtual-machines%2Fhc-series%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%20data-linktype%3D%22relative-path%22%3EHC%3C%2FA%3E%3CSPAN%3E%2C%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fvirtual-machines%2Fhbv2-series%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%20data-linktype%3D%22relative-path%22%3EHBv2%3C%2FA%3E%3CSPAN%3E%26nbsp%3Band%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fvirtual-machines%2Fndv2-series%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%20data-linktype%3D%22relative-path%22%3ENDv2%3C%2FA%3E%3CSPAN%3E.%20Accelerated%20Networking%20enables%20Single%20Root%20IO%20Virtualization%20(SR-IOV)%20for%20a%20VM%E2%80%99s%20Ethernet%20SmartNIC%20resulting%20in%20enhanced%20throughput%20of%2030%20Gbps%2C%20and%20lower%20and%20more%20consistent%20latencies%20over%20the%20Azure%20Ethernet%20network.%20Performance%20data%20with%20guidance%20on%20optimizations%20to%20achieve%20higher%20throughout%20of%20up%20to%2038%20Gbps%20on%20some%20VMs%20is%20available%20on%20this%20%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fazure-global%2Fperformance-impact-of-enabling-accelerated-networking-on-hbv2%2Fba-p%2F2103451%22%20target%3D%22_self%22%3Eblog%20post%20(Performance%20impact%20of%20enabling%20AccelNet)%3C%2FA%3E%26nbsp%3B.%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CSPAN%3ENote%20that%20this%20enhanced%20Ethernet%20capability%20is%20still%20additional%20to%20the%20RDMA%20capabilities%20over%20the%20InfiniBand%20network.%20Accelerated%20Networking%20over%20the%20Ethernet%20network%20will%20improve%20performance%20of%20loading%20VM%20OS%20images%2C%20Azure%20Storage%20resources%2C%20or%20communicating%20with%20other%20resources%20including%20Compute%20VMs.%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CSPAN%3EWhen%20enabling%20Accelerated%20Networking%20on%20supported%20Azure%20HPC%20and%20GPU%20VMs%20it%20is%20important%20to%20understand%20the%20changes%20you%20will%20see%20within%20the%20VM%20and%20what%20those%20changes%20may%20mean%20for%20your%20workloads.%20S%3C%2FSPAN%3E%3CSPAN%3Eome%20platform%20changes%20for%20this%20capability%20may%20impact%20behavior%20of%20certain%20MPI%20libraries%26nbsp%3B%20(and%20older%20versions)%20when%20running%20jobs%20over%20InfiniBand.%20Specifically%20the%20InfiniBand%20interface%20on%20some%20VMs%20may%20have%20a%20slightly%20different%20name%20(mlx5_1%20as%20opposed%20to%20earlier%20mlx5_0)%20and%20this%20may%20require%20tweaking%20of%20the%20MPI%20command%20lines%20especially%20when%20using%20the%20UCX%20interface%20(commonly%20with%20OpenMPI%20and%20HPC-X).%20%3CFONT%20size%3D%223%22%3EThis%20%3C%2FFONT%3Earticle%20provides%20details%20on%26nbsp%3B%3C%2FSPAN%3E%3CSPAN%3Ehow%20to%20address%20any%20observed%20issues.%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CSTRONG%3E%3CFONT%20size%3D%224%22%3EEnabling%20Accelerated%20Networking%3C%2FFONT%3E%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CFONT%20size%3D%223%22%3ENew%20VMs%20with%20Accelerated%20Networking%20(AN)%20can%20be%20created%20for%20%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fvirtual-network%2Fcreate-vm-accelerated-networking-cli%22%20target%3D%22_self%22%20rel%3D%22noopener%20noreferrer%22%3ELinux%3C%2FA%3E%20and%20%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fvirtual-network%2Fcreate-vm-accelerated-networking-powershell%22%20target%3D%22_self%22%20rel%3D%22noopener%20noreferrer%22%3EWindows.%3C%2FA%3E%3C%2FFONT%3E%3CFONT%20size%3D%223%22%3E%26nbsp%3BSupported%20operating%20systems%20are%20listed%20in%20the%20respective%20links.%20AN%20can%20be%20enabled%20on%20existing%20VMs%20by%20deallocating%20VM%2C%20enabling%20AN%20and%20restarting%20VM%20(%3CA%20href%3D%22https%3A%2F%2Fmicrosoft.github.io%2FAzureTipsAndTricks%2Fblog%2Ftip226.html%22%20target%3D%22_self%22%20rel%3D%22nofollow%20noopener%20noreferrer%22%3Einstructions%3C%2FA%3E).%3C%2FFONT%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CFONT%20size%3D%223%22%3ENote%20that%20certain%20methods%20of%20creating%2Forchestrating%20VMs%20will%20enable%20AN%20by%20default.%20For%20example%2C%20if%20the%20VM%20type%20is%20AN-enabled%2C%20the%20Portal%20method%20of%20creation%20will%2C%20by%20default%2C%20check%20%22Accelerated%20Networking%22%3C%2FFONT%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22Capture.PNG%22%20style%3D%22width%3A%20294px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F248795i4A8DA1864CDDE8B4%2Fimage-size%2Fmedium%3Fv%3D1.0%26amp%3Bpx%3D400%22%20role%3D%22button%22%20title%3D%22Capture.PNG%22%20alt%3D%22Capture.PNG%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3ESome%20other%20orchestrators%20may%20also%20choose%20to%20enable%20AN%20by%20default%2C%20such%20as%20%3CA%20href%3D%22https%3A%2F%2Fazure.microsoft.com%2Fen-us%2Ffeatures%2Fazure-cyclecloud%2F%22%20target%3D%22_self%22%20rel%3D%22noopener%20noreferrer%22%3ECycleCloud%3C%2FA%3E%20which%20is%20the%20simplest%20way%20to%20get%20started%20with%20HPC%20on%20Azure.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CSTRONG%3E%3CFONT%20size%3D%224%22%3ENew%20network%20interface%3A%3C%2FFONT%3E%20%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3EAs%20the%20Ethernet%20NIC%20is%20now%20SR-IOV%20enabled%2C%20it%20will%20show%20up%20as%20a%20new%20network%20interface.%20Running%3CCODE%3Elspci%3C%2FCODE%3Ecommand%20should%20show%20an%20additional%20Mellanox%20virtual%20function%20(VF)%20for%20Ethernet.%20On%20an%20InfiniBand%20enabled%20VM%2C%20there%20will%20be%202%20VFs%20(Ethernet%20and%20InfiniBand).%20For%20example%2C%20following%20is%20a%20screenshot%20from%20Azure%20HBv2%20VM%20instance%20(with%20OFED%20drivers%20for%20InfiniBand).%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CSPAN%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22ibv_devinfo.PNG%22%20style%3D%22width%3A%20999px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F247009i68423B8E5ABEEA2F%2Fimage-size%2Flarge%3Fv%3D1.0%26amp%3Bpx%3D999%22%20role%3D%22button%22%20title%3D%22ibv_devinfo.PNG%22%20alt%3D%22ibv_devinfo.PNG%22%20%2F%3E%3C%2FSPAN%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CEM%3EFigure%201%3A%20In%20this%20example%2C%26nbsp%3B%3C%2FEM%3E%3CEM%3E%22mlx5_0%22%20is%20the%20ethernet%20interface%2C%20and%20%22mlx5_1%22%20is%20the%20InfiniBand%20interface.%3C%2FEM%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CSPAN%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22ibstat.PNG%22%20style%3D%22width%3A%20403px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F247010i0875F34DADB22C83%2Fimage-size%2Flarge%3Fv%3D1.0%26amp%3Bpx%3D999%22%20role%3D%22button%22%20title%3D%22ibstat.PNG%22%20alt%3D%22ibstat.PNG%22%20%2F%3E%3C%2FSPAN%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CEM%3EFigure%202%3A%20%3C%2FEM%3E%3CEM%3EInfiniBand%20interface%20status%20(mlx5_1%20in%20this%20example)%3C%2FEM%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CFONT%20size%3D%224%22%3E%3CSTRONG%3EImpact%20of%20additional%20network%20interface%3A%3C%2FSTRONG%3E%3C%2FFONT%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3EAs%20InfiniBand%20device%20assignment%20is%20asynchronous%2C%20the%20device%20order%20in%20the%20VM%20can%20be%20random.%20i.e.%2C%20some%20VMs%20may%20get%20%22mlx5_0%22%20as%20InfiniBand%20interface%2C%20whereas%20certain%20other%20VMs%20may%20get%20%22mlx5_1%22%20as%20InfiniBand%20interface.%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3EHowever%2C%20this%20can%20be%20made%20deterministic%20by%20using%20a%20%3CA%20href%3D%22https%3A%2F%2Fgithub.com%2FAzure%2Fazhpc-images%2Fblob%2F794b53e2cb63476c5ccb370296ca4f8c9ea429c5%2Fcentos%2Fcommon%2Fadd-udev-rules.sh%23L4%22%20target%3D%22_self%22%20rel%3D%22noopener%20noreferrer%22%3Eudev%20rule%3C%2FA%3E%2C%20proposed%20by%20%3CA%20href%3D%22https%3A%2F%2Fgithub.com%2Flinux-rdma%2Frdma-core%22%20target%3D%22_self%22%20rel%3D%22noopener%20noreferrer%22%3Erdma-core%3C%2FA%3E.%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CPRE%20class%3D%22lia-code-sample%20language-applescript%22%3E%3CCODE%3E%24%20cat%20%2Fetc%2Fudev%2Frules.d%2F60-ib.rules%0A%23%20SPDX-License-Identifier%3A%20(GPL-2.0%20OR%20Linux-OpenIB)%0A%23%20Copyright%20(c)%202019%2C%20Mellanox%20Technologies.%20All%20rights%20reserved.%20See%20COPYING%20file%0A%23%0A%23%20Rename%20modes%3A%0A%23%20NAME_FALLBACK%20-%20Try%20to%20name%20devices%20in%20the%20following%20order%3A%0A%23%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20by-pci%20-%26gt%3B%20by-guid%20-%26gt%3B%20kernel%0A%23%20NAME_KERNEL%20-%20leave%20name%20as%20kernel%20provided%0A%23%20NAME_PCI%20-%20based%20on%20PCI%2Fslot%2Ffunction%20location%0A%23%20NAME_GUID%20-%20based%20on%20system%20image%20GUID%0A%23%0A%23%20The%20stable%20names%20are%20combination%20of%20device%20type%20technology%20and%20rename%20mode.%0A%23%20Infiniband%20-%20ib*%0A%23%20RoCE%20-%20roce*%0A%23%20iWARP%20-%20iw*%0A%23%20OPA%20-%20opa*%0A%23%20Default%20(unknown%20protocol)%20-%20rdma*%0A%23%0A%23%20Example%3A%0A%23%20*%20NAME_PCI%0A%23%20%20%20pci%20%3D%200000%3A00%3A0c.4%0A%23%20%20%20Device%20type%20%3D%20IB%0A%23%20%20%20mlx5_0%20-%26gt%3B%20ibp0s12f4%0A%23%20*%20NAME_GUID%0A%23%20%20%20GUID%20%3D%205254%3A00c0%3Afe12%3A3455%0A%23%20%20%20Device%20type%20%3D%20RoCE%0A%23%20%20%20mlx5_0%20-%26gt%3B%20rocex525400c0fe123455%0A%23%0AACTION%3D%3D%22add%22%2C%20SUBSYSTEM%3D%3D%22infiniband%22%2C%20PROGRAM%3D%22rdma_rename%20%25k%20NAME_PCI%22%0A%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3EWith%20the%20above%20udev%20rule%2C%20the%20interfaces%20can%20be%20named%20as%20follows%3A%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22udevrule-devices.PNG%22%20style%3D%22width%3A%20663px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F247344i0AED02148C583C01%2Fimage-size%2Flarge%3Fv%3D1.0%26amp%3Bpx%3D999%22%20role%3D%22button%22%20title%3D%22udevrule-devices.PNG%22%20alt%3D%22udevrule-devices.PNG%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3ENote%20that%20the%20interface%20name%20can%20appear%20differently%20on%20each%20VM%20as%20the%20PCI%20ID%20for%20the%20InfiniBand%20VF%20is%20different%20on%20each%20VM.%20There%20is%20ongoing%20work%20to%20make%20the%20PCI%20ID%20unique%20such%20that%20the%20interface%20is%20consistent%20across%20all%20VMs.%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CFONT%20size%3D%224%22%3E%3CSTRONG%3EImpact%20on%20MPI%20libraries%3C%2FSTRONG%3E%3C%2FFONT%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3EMost%20MPI%20libraries%20do%20not%20need%20any%20changes%20to%20adapt%20to%20this%20new%20interface.%20However%2C%20certain%20MPI%20libraries%2C%20especially%20those%20using%20older%20UCX%20versions%2C%20may%20try%20to%20use%20the%20first%20available%20interface.%20If%20the%20first%20interface%20happens%20to%20be%20the%20Ethernet%20VF%20(due%20to%20asynchronous%20initialization)%2C%20MPI%20jobs%20can%20fail%20when%20using%20such%20MPI%20libraries%20(and%20versions).%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3EWhen%20%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fvirtual-machines%2Fworkloads%2Fhpc%2Fsetup-mpi%23discover-partition-keys%22%20target%3D%22_self%22%20rel%3D%22noopener%20noreferrer%22%3EPKEYs%3C%2FA%3E%20are%20explicitly%20required%20(e.g.%20for%20Platform%20MPI)%20for%20communication%20with%20VMs%20in%20the%20same%20InfiniBand%20tenant%2C%20ensure%20that%20PKEYS%20are%20probed%20for%20in%20the%20correct%20location%20appropriate%20for%20the%20interface.%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%3CFONT%20size%3D%224%22%3E%3CSTRONG%3EResolution%3C%2FSTRONG%3E%3C%2FFONT%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3EAs%20a%20root%20cause%2C%20UCX%20has%20now%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Fgithub.com%2Fopenucx%2Fucx%2Fpull%2F5965%22%20target%3D%22_self%22%20rel%3D%22noopener%20noreferrer%22%3Efixed%20this%20issue.%3C%2FA%3E%26nbsp%3BSo%2C%20if%20you%20are%20using%20an%20MPI%20library%20that%20uses%20UCX%2C%20please%20make%20sure%20to%20build%20against%20and%20use%20the%20right%20UCX%20version.%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3EAs%20a%20recommended%20MPI%20implementation%2C%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Fwww.mellanox.com%2Fproducts%2Fhpc-x-toolkit%22%20target%3D%22_self%22%20rel%3D%22nofollow%20noopener%20noreferrer%22%3EHPC-X%202.7.4%3C%2FA%3E%20for%20Azure%20includes%20this%20fix.%20An%20example%20of%20a%20%3CA%20href%3D%22http%3A%2F%2Fwww.mellanox.com%2Fpage%2Fhpcx_eula%3Fmrequest%3Ddownloads%26amp%3Bmtype%3Dhpc%26amp%3Bmver%3Dhpc-x%26amp%3Bmname%3Dv2.8%2Fhpcx-v2.8.0-gcc-MLNX_OFED_LINUX-5.2-1.0.4.0-redhat8.3-x86_64.tbz%22%20target%3D%22_self%22%20rel%3D%22nofollow%20noopener%20noreferrer%22%3E2.8.0%20build%20downloadable%20from%20Nvidia%20Networking%20is%20available.%3C%2FA%3E%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3EThe%20Azure%20CentOS-HPC%20VM%20images%20(CentOS%207.8%2C%208.1%20based)%20on%20the%20Marketplace%20have%20also%20been%20updated%20with%20the%20above%20(and%20MOFED%205.2)%3A%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3EOffer%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20Publisher%26nbsp%3B%20Sku%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20Urn%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20%26nbsp%3B%20Version%3CBR%20%2F%3E----------%20-----------%20--------%20--------------------------------------------%20--------------%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3ECentOS-HPC%20OpenLogic%207_8%20OpenLogic%3ACentOS-HPC%3A7_8%3A7.8.2021020400%207.8.2021020400%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3ECentOS-HPC%20OpenLogic%207_8-gen2%20OpenLogic%3ACentOS-HPC%3A7_8-gen2%3A7.8.2021020401%207.8.2021020401%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3ECentOS-HPC%20OpenLogic%208_1%20OpenLogic%3ACentOS-HPC%3A8_1%3A8.1.2021020400%208.1.2021020400%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3ECentOS-HPC%20OpenLogic%208_1-gen2%20OpenLogic%3ACentOS-HPC%3A8_1-gen2%3A8.1.2021020401%208.1.2021020401%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3ENote%20that%20CentOS-HPC%207.6%20and%207.7%20have%20NOT%20yet%20been%20updated%20with%20this%20HPC-X%202.7.4%20which%20has%20the%20UCX%20fix%20for%20the%20Accelerated%20Networking%20related%20issue%20being%20discussed%20here.%20All%20the%20CentOS-HPC%20images%20in%20the%20Marketplace%20continue%20to%20be%20useful%20for%20all%20other%20scenarios.%3C%2FP%3E%0A%3CP%20class%3D%22lia-align-justify%22%3EMore%20details%20on%20the%26nbsp%3BAzure%20CentOS-HPC%20VM%20images%20is%20available%20on%20the%20%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fazure-compute%2Fazure-hpc-vm-images%2Fba-p%2F977094%22%20target%3D%22_self%22%3Eblog%20post%3C%2FA%3E%20and%20%3CA%20href%3D%22https%3A%2F%2Fgithub.com%2FAzure%2Fazhpc-images%22%20target%3D%22_self%22%20rel%3D%22noopener%20noreferrer%22%3EGitHub%3C%2FA%3E.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-TEASER%20id%3D%22lingo-teaser-2067965%22%20slang%3D%22en-US%22%3E%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%2238_Ormseth_R_SC19_1_big%22%20style%3D%22width%3A%20400px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F248796iA6B8855F674FF7B3%2Fimage-size%2Fmedium%3Fv%3D1.0%26amp%3Bpx%3D400%22%20role%3D%22button%22%20title%3D%2238_Ormseth_R_SC19_1_big%22%20alt%3D%2238_Ormseth_R_SC19_1_big%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-TEASER%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2146230%22%20slang%3D%22en-US%22%3ERe%3A%20Accelerated%20Networking%20on%20HB%2C%20HC%2C%20HBv2%20and%20NDv2%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2146230%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F324164%22%20target%3D%22_blank%22%3E%40jithinjose%3C%2FA%3E%26nbsp%3B%20and%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F310749%22%20target%3D%22_blank%22%3E%40AmanVerma%3C%2FA%3E%26nbsp%3Bthanks%20for%20the%20updates.%26nbsp%3B%20I%20don't%20see%20the%20proposed%20udev%20rule%2060%20in%20the%2078-hpc.ks%20here%3A%26nbsp%3B%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Fgithub.com%2Fopenlogic%2FAzureBuildCentOS%2Fblob%2Fmaster%2Fks%2Fazure%2Fcentos78-hpc.ks%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3EAzureBuildCentOS%2Fcentos78-hpc.ks%20at%20master%20%C2%B7%20openlogic%2FAzureBuildCentOS%20%C2%B7%20GitHub%3C%2FA%3E%26nbsp%3B....is%20it%20possible%20to%20add%20it%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E
Microsoft

Azure Accelerated Networking is now available on the RDMA over InfiniBand capable and SR-IOV enabled VM sizes HBHCHBv2 and NDv2. Accelerated Networking enables Single Root IO Virtualization (SR-IOV) for a VM’s Ethernet SmartNIC resulting in enhanced throughput of 30 Gbps, and lower and more consistent latencies over the Azure Ethernet network. Performance data with guidance on optimizations to achieve higher throughout of up to 38 Gbps on some VMs is available on this blog post (Performance impact of enabling AccelNet) .

 

Note that this enhanced Ethernet capability is still additional to the RDMA capabilities over the InfiniBand network. Accelerated Networking over the Ethernet network will improve performance of loading VM OS images, Azure Storage resources, or communicating with other resources including Compute VMs.

 

When enabling Accelerated Networking on supported Azure HPC and GPU VMs it is important to understand the changes you will see within the VM and what those changes may mean for your workloads. Some platform changes for this capability may impact behavior of certain MPI libraries  (and older versions) when running jobs over InfiniBand. Specifically the InfiniBand interface on some VMs may have a slightly different name (mlx5_1 as opposed to earlier mlx5_0) and this may require tweaking of the MPI command lines especially when using the UCX interface (commonly with OpenMPI and HPC-X). This article provides details on how to address any observed issues.

 

Enabling Accelerated Networking

New VMs with Accelerated Networking (AN) can be created for Linux and Windows. Supported operating systems are listed in the respective links. AN can be enabled on existing VMs by deallocating VM, enabling AN and restarting VM (instructions).

Note that certain methods of creating/orchestrating VMs will enable AN by default. For example, if the VM type is AN-enabled, the Portal method of creation will, by default, check "Accelerated Networking"

Capture.PNG

Some other orchestrators may also choose to enable AN by default, such as CycleCloud which is the simplest way to get started with HPC on Azure.

 

New network interface:

As the Ethernet NIC is now SR-IOV enabled, it will show up as a new network interface. Running lspci command should show an additional Mellanox virtual function (VF) for Ethernet. On an InfiniBand enabled VM, there will be 2 VFs (Ethernet and InfiniBand). For example, following is a screenshot from Azure HBv2 VM instance (with OFED drivers for InfiniBand).

 

ibv_devinfo.PNG

Figure 1: In this example, "mlx5_0" is the ethernet interface, and "mlx5_1" is the InfiniBand interface.

 

ibstat.PNG

Figure 2: InfiniBand interface status (mlx5_1 in this example)

 

Impact of additional network interface:

As InfiniBand device assignment is asynchronous, the device order in the VM can be random. i.e., some VMs may get "mlx5_0" as InfiniBand interface, whereas certain other VMs may get "mlx5_1" as InfiniBand interface.

 

However, this can be made deterministic by using a udev rule, proposed by rdma-core

 

 

$ cat /etc/udev/rules.d/60-ib.rules
# SPDX-License-Identifier: (GPL-2.0 OR Linux-OpenIB)
# Copyright (c) 2019, Mellanox Technologies. All rights reserved. See COPYING file
#
# Rename modes:
# NAME_FALLBACK - Try to name devices in the following order:
#                 by-pci -> by-guid -> kernel
# NAME_KERNEL - leave name as kernel provided
# NAME_PCI - based on PCI/slot/function location
# NAME_GUID - based on system image GUID
#
# The stable names are combination of device type technology and rename mode.
# Infiniband - ib*
# RoCE - roce*
# iWARP - iw*
# OPA - opa*
# Default (unknown protocol) - rdma*
#
# Example:
# * NAME_PCI
#   pci = 0000:00:0c.4
#   Device type = IB
#   mlx5_0 -> ibp0s12f4
# * NAME_GUID
#   GUID = 5254:00c0:fe12:3455
#   Device type = RoCE
#   mlx5_0 -> rocex525400c0fe123455
#
ACTION=="add", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_PCI"

 

 

 

With the above udev rule, the interfaces can be named as follows:

udevrule-devices.PNG

 

Note that the interface name can appear differently on each VM as the PCI ID for the InfiniBand VF is different on each VM. There is ongoing work to make the PCI ID unique such that the interface is consistent across all VMs.

 

Impact on MPI libraries

Most MPI libraries do not need any changes to adapt to this new interface. However, certain MPI libraries, especially those using older UCX versions, may try to use the first available interface. If the first interface happens to be the Ethernet VF (due to asynchronous initialization), MPI jobs can fail when using such MPI libraries (and versions).

When PKEYs are explicitly required (e.g. for Platform MPI) for communication with VMs in the same InfiniBand tenant, ensure that PKEYS are probed for in the correct location appropriate for the interface.

 

Resolution

As a root cause, UCX has now fixed this issue. So, if you are using an MPI library that uses UCX, please make sure to build against and use the right UCX version.

As a recommended MPI implementation, HPC-X 2.7.4 for Azure includes this fix. An example of a 2.8.0 build downloadable from Nvidia Networking is available.

 

The Azure CentOS-HPC VM images (CentOS 7.8, 8.1 based) on the Marketplace have also been updated with the above (and MOFED 5.2):

 

Offer        Publisher  Sku      Urn                                                            Version
---------- ----------- -------- -------------------------------------------- --------------

CentOS-HPC OpenLogic 7_8 OpenLogic:CentOS-HPC:7_8:7.8.2021020400 7.8.2021020400

CentOS-HPC OpenLogic 7_8-gen2 OpenLogic:CentOS-HPC:7_8-gen2:7.8.2021020401 7.8.2021020401

CentOS-HPC OpenLogic 8_1 OpenLogic:CentOS-HPC:8_1:8.1.2021020400 8.1.2021020400

CentOS-HPC OpenLogic 8_1-gen2 OpenLogic:CentOS-HPC:8_1-gen2:8.1.2021020401 8.1.2021020401

 

Note that CentOS-HPC 7.6 and 7.7 have NOT yet been updated with this HPC-X 2.7.4 which has the UCX fix for the Accelerated Networking related issue being discussed here. All the CentOS-HPC images in the Marketplace continue to be useful for all other scenarios.

More details on the Azure CentOS-HPC VM images is available on the blog post and GitHub.

9 Comments
Regular Visitor

Hi Jithin,

 

Thanks for the clarification!

 

Is the IB device naming consistent across all VMS in a scale-sets?  Or is it possible to get different IB device names in different VMs ?

 

thanks

Michael

Regular Visitor

Hi Jithin,

 

Does the hypervisor now identify correctly the CPU set that is associated with the physical proximity of the HCAs?

 

That is, if IPOI is the IBoIB if name, if I do

 

$ cat /sys/class/net/${IPOIB}/device/local_cpulist /sys/class/net/${IPOIB}/device/local_cpus

will I get the CPUs closer to the device? 

 

thanks 

 

Microsoft

@drMikeT , Devices in different VMs may get different names as it is just the translation of PCI ID. We are trying to make PCI ID unique so that it will be consistent across VMs. Until this is done, our recommendation is to disable AN during deployments.
I will update this blog as we have more updates on this front.

 

-Jithin

Regular Visitor

When IB names are different, can UCX correctly identify and use the correct IB i/f or should we specify it at the command line with a MPMD style mpirun ? 

Microsoft

We are working on a solution that will avoid the need for MPMD style mpirun. With this solution, the IB interface name will be consistent across all VMs.

Regular Visitor

In the interim, can we get a script (or any other way) to identify the correct IB (or eth) IP device name per node ?

 

Do U know if Azure will ever offer VMs with multiple phys IB interfaces?

 

When we see IPoIB names such as ib0 does this imply that AN has not yet been enabled ?

 

Thanks!

Microsoft

Jithin
a Customer mentioned , they activated Acceleration Network and started VMSS and then deactivated AN, after that  they can not add/remove VM on the same VMSS.  do you have any comment ?

 

Thanks!

Microsoft

@jithinjose  and @AmanVerma thanks for the updates.  I don't see the proposed udev rule 60 in the 78-hpc.ks here:  AzureBuildCentOS/centos78-hpc.ks at master · openlogic/AzureBuildCentOS · GitHub ....is it possible to add it?

Microsoft

@Jerrance : @jithinjose also confirms that we’re not planning to include the udev rule in the images today.

That would change the current behavior (naming, ordering of devices) and we don’t necessarily want that (break current behavior) given the current fix (with new UCX) works.

However this (udev rule in images) might be an option in the future (pending additional work) if we can make it consistent across SKUs.