Configuring InfiniBand for Ubuntu HPC and GPU VMs
Published Mar 10 2020 08:58 PM 13.6K Views
Microsoft

There has been greater interest in the usage of the HPC optimized VM images that we publish due to the:

  1. GA of SR-IOV enabled HPC VMs (HB, HC, HB_v2),
  2. Recent platform update to make NCr_v3 SR-IOV enabled,
  3. GA of NDr_v2

While those images (CentOS-HPC 7.6, 7.7) are originally targeted for use on the SR-IOV enabled HPC VMs (HB, HC, HB_v2), conceptually, they are useful for the other now SR-IOV enabled GPU VMs (NCr_v3, NDr_v2) too. Note that the GPU VMs would additionally require the Nvidia GPU drivers (VM extension, manually).

 

Typically we find that users of the HPC VMs running traditional HPC applications tend to utilize CentOS as their preferred OS. While users of AI/ML applications running on the GPU VMs tend to prefer Ubuntu as the OS. The CentOS-HPC VM OS images (>=7.6 for the SR-IOV enabled VMs, and <=7.5 for the non-SR-IOV enabled VMs) provide a ready to use VM image with the appropriate drivers and MPI runtimes.

 

This article attempts to consolidate guidance on configuring InfiniBand (IB) for Ubuntu across both SR-IOV and non-SR-IOV enabled HPC and GPU VMs. Specifically it will focus on getting the right drivers setup and in bringing up the appropriate IB interface on the VMs. At the time of writing, the following steps at least apply to Ubuntu 18.04 LTS image by Canonical on the Azure Marketplace.

 

NOTE: This article was written in March 2020. Many developments have happened since the time, including GA of new H* and N* VM sizes, as well as newer CentOS-HPC VM image versions. In fact, an Ubuntu-HPC VM image (for newer SR-IOV enabled VM sizes) is also now available. See the HPC VM image documentation and the TechCommunity blog on HPC VM images for more details.

 

Non- SR-IOV enabled VMs

  1. Install dapl (and its dependencies rdma_cm, ibverbs), and user mode mlx4 library.

    sudo apt-get update
    sudo apt-get install libdapl2 libmlx4-1
  2. In /etc/waagent.conf, enable RDMA by uncommenting the following configuration lines (root access)

    OS.EnableRDMA=y
    OS.UpdateRdmaDriver=y
  3. Restart the waagent service

    sudo systemctl restart walinuxagent.service

     

The IB interface eth1 should come up with an RDMA IP address.

 

The IB related kernel modules are not auto-loaded on Ubuntu anymore. This is a departure from earlier practice where the kernel modules were built into the image. Now these are available as loadable modules so that a user can install Mellanox OFED driver.

 

Note:

Support for the NetworkDirect driver stack (vmbus-rdma-driver required in the non-SRIOV VMs) was dropped in the 5.3 kernel in the18.04-LTS 18.04.202004290 image in the Marketplace. This may lead to issues in bringing up the IB interface as reported here. This may be addressed with Canonical starting in kernel 5.4.

As a workaround, an older image with kernel 5.0 (say Canonical UbuntuServer 18.04-LTS 18.04.202004080 with 5.0.0-1036-azure kernel) has the missing module "hv_network_direct" and works fine.
Ubuntu 20.04 also does not show this issue.

 

SR-IOV enabled VMs with inbox driver

  1. Load following kernel modules (either mpdprob or edit /etc/modules)

    ib_uverbs
    rdma_ucm
    ib_umad
    ib_ipoib
  2. In /etc/waagent.conf, enable RDMA by uncommenting the following configuration lines (root access)

    OS.EnableRDMA=y
  3. Reboot VM

    sudo reboot

The IB interface ib0 should come up with an RDMA IP address.

 

SR-IOV enabled VMs with OFED driver

  1. The appropriate Mellanox OFED driver an be downloaded and installed as referenced below

    wget http://content.mellanox.com/ofed/MLNX_OFED-5.0-1.0.0.0/MLNX_OFED_LINUX-5.0-1.0.0.0-ubuntu18.04-x86_64.tgz
    tar zxvf MLNX_OFED_LINUX-5.0-1.0.0.0-ubuntu18.04-x86_64.tgz
    sudo ./MLNX_OFED_LINUX-5.0-1.0.0.0-ubuntu18.04-x86_64/mlnxofedinstall --add-kernel-support
  2. Load the new driver

    sudo /etc/init.d/openibd restart
  3. In /etc/waagent.conf, enable RDMA by uncommenting the following configuration lines (root access)

    OS.EnableRDMA=y
  4. If needed, assign the RDMA IP address manually to the ib0 interface

    IP=$(sudo sed '/rdmaIPv4Address=/!d;s/.*rdmaIPv4Address="\([0-9.]*\)".*/\1/' /var/lib/waagent/SharedConfig.xml)/16
    sudo ifconfig ib0 $IP

 

The following is an optional step, applicable to all the three modes above, but not necessarily related to the above discussion of configuring IB. When running applications as non-root user, set the following memory limits in /etc/security/limits.conf.

<user or group_name or *> hard memlock <memory_required_by_application_in_KB or unlimited>
<user or group_name or *> soft memlock <memory_required_by_application_in_KB or unlimited>

 

The Scripts to configure an Ubuntu-based VM image with OFED drivers and MPI packages are also available on GitHub.

 

Note:

There is a known issue with cloud-init on Ubuntu VM images as it tries to bring up the IB interface. This can happen either on VM reboot or when trying to create a VM image after generalization. The VM boot logs may show an error like so: “Starting Network Service...RuntimeError: duplicate mac found! both 'eth1' and 'ib0' have mac”.

This 'duplicate MAC with cloud-init on Ubuntu" is a known issue and will be addressed in the coming weeks. In the meantime, there is a workaround which is basically:

  • disabling networking in cloud-init and,
  • then updating the generated netplan to remove the MAC.

This workaround is described in more details under Troubleshooting HPC Workloads.

Co-Authors
Version history
Last update:
‎Apr 28 2021 04:03 PM
Updated by: