A note about rhashtable kernel bug in CentOS 7.6/7.7
Published Jun 24 2020 09:51 PM 2,631 Views
Microsoft

Recently, the Azure HPC team and some customers have observed that a bug in older versions of the public Linux kernel can be triggered in ways that can cause poor or variable performance for HPC applications running on Azure H-series VMs.

 

Specifically, the bug is in the rhastable shrink logic and is present in Linux kernel versions 3.10.0-1062.12.1.el7 and below. These kernel versions are defaults for commonly used Linux OSes for HPC, such as CentOS 7.6 and CentOS 7.7 VMs.

 

As described in this post, this bug causes endless “insert_work” invocations because of repeated calls to rht_deferred_worker(). Thankfully, this kernel bug is fixed in kernel versions 3.10.0-1127.el7 and above.

 

More details about his bug can be found at the following resources:

Descriptionhttps://lkml.org/lkml/2019/1/23/789

Bug Fixhttps://github.com/torvalds/linux/commit/408f13ef358aa5ad56dc6230c2c7deb92cf462b1

 

Symptoms:

When this bug is encountered, one of the kernel worker threads will consume nearly 100% of a CPU core as shown below:

top.png

 

If you trace the kernel thread, you will see continuous invocations of rht_deferred_work.

 

kworker/15:1-3679 [019] .... 139066.264712: mutex_lock <-rht_deferred_worker
kworker/15:1-3679 [019] .... 139066.264712: _cond_resched <-mutex_lock
kworker/15:1-3679 [019] .... 139066.264712: mutex_unlock <-rht_deferred_worker
kworker/15:1-3679 [019] .... 139066.264712: queue_work_on <-rht_deferred_worker
kworker/15:1-3679 [019] d... 139066.264712: __queue_work <-queue_work_on
kworker/15:1-3679 [019] d... 139066.264713: get_work_pool <-__queue_work
kworker/15:1-3679 [019] d... 139066.264713: _raw_spin_lock <-__queue_work
kworker/15:1-3679 [019] d... 139066.264713: insert_work <-__queue_work
kworker/15:1-3679 [019] d... 139066.264713: get_pwq.isra.19 <-insert_work
…

 

 

Impact of this bug on HPC Application Performance:

The performance impact is severe especially when all CPU cores are used. As you can see in the following tests using four HC44rs VMs with CentOS 7.7 HPC VM (kernel version 3.10.0-1062.12.1.el7), OpenFOAM shows very large performance variations over different runs. Yet, performance is consistent when using the CentOS 7.7 HPC image with kernel version 3.10.0-1127.el7 as well as CentOS 8.1 HPC VM.

OpenFOAM.png

 

 

 

Similar behavior can be seen with MPI RandomAccess performance results shown below. CentOS 7.7 HPC VM running kernel version 3.10.0-1062.12.1.el7 shows big performance variations. But, performance is stable and consistent on CentOS 7.7 HPC VM with kernel 3.10.0-1127.el7 and CentOS 8.1 HPC VM.

MPIRandomAccess.png

 

 

 

Resolution Steps:

 

  • If you are currently using CentOS 7.6/7.7 HPC images and with kernel version is 3.10.0-1062.12.1.el7 or below, then use one of the following methods for resolution.

1.Update Kernel: The kernel bug fix is included in 3.10.0-1127.el7.x86_64, and this update is available for both CentOS 7.6 and CentOS 7.7. Please do the following to update your kernel:

 

sudo yum update kernel -y
sudo reboot​

 

 

2. Switch to CentOS 8.1 HPC Image: You can switch to CentOS 8.1 HPC image as it has the kernel bug fix already.

 

3. New CentOS 7.6/7.7 HPC Image: Updates of Azure’s optimized CentOS 7.6/7.7 HPC images are being prepared and will be available soon in the Azure Marketplace. 
- CentOS 7.7 HPC Image: OpenLogic:CentOS-HPC:7.7:7.7.2020062600, OpenLogic:CentOS-HPC:7_7-gen2:7.7.2020062601 or newer versions.

- CentOS 7.6 HPC Image: OpenLogic:CentOS-HPC:7.6:7.6.2020062900, OpenLogic:CentOS-HPC:7_6gen2:7.6.2020062901 or newer versions.

 

  • If you are currently using CentOS 8.1 HPC images:
    No action is required, you are not impacted by this kernel bug.
Version history
Last update:
‎Jun 30 2020 09:56 AM
Updated by: