Optimizing Network Throughput on Azure M-series VMs
Published Jul 27 2022 08:06 PM 9,041 Views

1.    HCMT, niping & iperf3 Throughput Testing on Azure M-series VMs

The SAP HCMT Network Topology Test, niping and/or iperf3 results are below the expected values in a VM hosted on Azure M-series under some circumstances. 

The SAP HCMT output could show errors such as those in the screenshot below.

The error below shows a 5-node scale-out SAP HANA landscape.  The problem is most often seen on SAP HANA scale-out systems, but may occur on large Oracle or DB2 systems.

The network optimization described in this blog may benefit Azure NetApp Files (ANF) scenarios as well, such as a large Oracle server running dNFS to ANF. 

 

It is recommended readers be fully familiar with the terminology such as NUMA Node, Processor, Core, Logical Processor and Hyperthreading before continuing to read this blog.  These terms are explained here

 

Cameron_MSFT_SAP_PM_0-1658721475777.png

 

2.    How to Analyze Network Performance

Network processing consumes a significant amount of CPU.  Modern Network Cards and Network Drivers leverage technologies to distribute and balance CPU consumption across multiple CPU cores.

Mellanox provides a set of tools that can “pin” network processing operations to a specific set of CPU cores or NUMA Nodes. The links at the end of this blog describe the process in more detail.

Microsoft internal testing has shown that limiting network processing queues to 16 and pinning network processing to NUMA node 0 as seen from the Linux guest OS in the VM is a configuration that delivers stable high throughput. 

Excluding the first CPU core, core 0 (logical processor 0 and 1) is recommended as many other operations are affinitized to this core. 

 

The procedure detailed in this blog has been implemented many times successfully, but careful testing should be done prior to implementing (to establish a baseline) and after implementation.

The procedure below should not be implemented on a production system without careful testing and validation in non-production systems.  The procedure in this blog only applies to Azure M-series (M, Mv2 and similiar).

Test scenarios have shown network throughput improvements of approximately 20%.  

 

Most Linux tools such as ‘vmstat’, show averages and are not useful for characterizing network interrupt utilization. It is recommended to install ‘nmon’ and sysstat(SAR). 

  1. Unfortunately NMON is not available via zypper, apt, yum and must be downloaded ( http://nmon.sourceforge.net/pmwiki.php)
  2. sysstat or SAR may or may not be installed and activated by default.  Typically, most SUSE gallery images have SAR running by default.  Check the directory /var/log/sa.  If the directory does not exist or does not contain recent sarXX files then follow the steps below
  3. KSAR is a graphical tool that presents system performance information in a simple and easy to interpret way.  This tool requires a runtime JVM https://github.com/vlsi/ksar 

If sysstat needs to be installed follow the steps below

# sudo yum install sysstat

# sudo service sysstat restart

Redirecting to /bin/systemctl restart sysstat.service

The /var/log/sa/sarXX files can be copied onto a Windows PC with sftp

sftp -i <keyfilename>.pem azureuser@<xx.xx.xx.xx>

get /var/log/sa/sar<XX>

 

Run "Java -jar C:\sap_media\ksar.jar" and capture graphs showing overall CPU consumption, then consumption on the first NUMA node (node 0) and Kernel/System CPU consumption on each Logical Processor.

 

An optimal pattern in KSAR will show approximately the same CPU consumption on all NUMA Nodes and Logical Processors.  NUMA Node 0 and Logical Processors 0 and 1 may have slightly higher CPU time, especially System CPU time (kernel time). 

CPU consumption over 80% on some NUMA Nodes and/or Logical Processors and simultaneously near 0-10% consumption on other NUMA Nodes and/or Logical Processors indicates a problem.

 

Nmon can be used to watch cpu consumption real-time and can help isolate high CPU processes.

 

3.    How to Optimize Network Interrupts on Azure M-series VMs

Follow this procedure to resolve the problem.  This procedure “pins” the interrupt processing onto the first NUMA Node (Node 0).  The first cpu core (logical processors 0 and 1) are excluded as these logical processors may be heavily used by other Operating System components

1. Download Mellanox tools using WGET:
                           wget https://github.com/Mellanox/mlnx-tools/archive/refs/tags/v5.1.3.zip

                           Unzip the file.  The two scripts used in this blog are:
                           a. set_irq_affinity_cpulist.sh
                           b. show_irq_affinity.sh

2. Turn off the irqbalance service :
                           systemctl stop irqbalance
                           systemctl disable irqbalance

and then double-check that it’s really inactive :
                           systemctl status irqbalance

Cameron_MSFT_SAP_PM_1-1658721475780.jpeg

 

  1. Run the command “ip link ls” to figure out the interface names for Accelerated Networking. In this example there are three vNICs but Accelerated Networking is only turned on for two of them. Accelerated Networking interfaces are identified by the key word “SLAVE”.

    Moving interrupts with the Mellanox tools is a concept that only applies to Accelerated Networking interfaces.
    Cameron_MSFT_SAP_PM_2-1658721475790.jpeg

     

    4. Run the command “lscpu | grep NUMA” to find the range of vCPU numbers for virtual numa node 0. In this example it is from vCPU 0 to vCPU 51 (52 logical processors running on 26 cores with Hyperthreading enabled)

Cameron_MSFT_SAP_PM_3-1658721475793.jpeg

 

  1. Run the Mellanox script and check on which vCPUs the interrupts of a specific Accelerated Networking interface are running :
                               ./show_irq_affinity.sh eth3 show_cpu_number (note: there are underscores in this command)
    In this example the interrupts start with interrupt #24 on vCPU 0. Interrupt #24 can in fact run on all vCPUs of virtual numa node 0 whereas the other interrupts are pinned accordingly on vCPU 0 to 30
    Cameron_MSFT_SAP_PM_7-1658721981161.png

    6. The second Accelerated Networking interface interrupt number starts with #56 but they sit on the same vCPUs as the interrupts for the first Accelerated Networking interface.  This is normal.     
    Cameron_MSFT_SAP_PM_5-1658721475858.jpeg

Note: depending on the workload reducing the number of queues via the command “ethtool” may improve performance. This command will reduce the number of queues for both interfaces to 16:

 
                           ./ethtool -L <Accelerated Networking interface> combined 16  and then place the interrupts on separate vCPUs on virtual numa node 0
The command above is not permanent.  To make this setting permanent follow this procedure. 

Prior to implementing this change display the current number of queues with the command ethtool -l eth0

 

SUSE

/etc/sysconfig/network/ifcfg-eth0

Add the line: ETHTOOL_OPTIONS=’-L eth0 combined 16’

How to configure network parameters for AutoYaST network installation | Support | SUSE

 

Red Hat

/etc/sysconfig/network-scripts/ifcfg-eth0

On RedHat 8 or above the following procedure can be used to permanently reduce the queues to 16.  

Step #1.  Create a file in /etc/NetworkManager/dispatcher.d/set_combined_queues

Step #2.  Enter a comment to document the script file (such as the link to this blog)

Step #3.  Enter the lines:

#!/bin/sh

/sbin/ethtool -L eth0 combined 16

Step #4.  Save the file and exit 

Step #5.  Change the mods on the file to allow execution - suggested is chmod 751

Step #6.  Restart the Red Hat OS and run ethtool -l eth0 and confirm the queues are correctly configured 

 

Note: There are differences in syntax and quotes between SUSE and Red Hat. Substitute eth0 for the active Accelerated Networking interface.

 

7. Use the Mellanox script “set_irq_affinity_cpulist.sh” to modify the vCPUs on which the interrupts of a specific Accelerated Networking interface are running. This command can “pin” or move them all to virtual NUMA node 0. As the guest OS is usually using Logical Processor 0 and 1 for specific tasks it is recommended to let the interrupts start with Logical Processor 2 (this is the first logical processor on core 1, which is the second core on the processor).
                           ./set_irq_affinity_cpulist.sh 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33 eth3

Cameron_MSFT_SAP_PM_6-1658721475867.jpeg

 

  1. Run the command below to validate that the IRQ (Interrupts) have moved to NUMA Node 0

                           ./show_irq_affinity.sh eth3 show_cpu_number (note: there are underscores in this command)
 

4.    After Setting IRQ Validate Performance

Repeat SAP HCMT and/or niping or iperf3. 

In addition, it is recommended to repeat analysis with KSAR and Nmon.

In most cases SAP HCMT and other tools will show approximately 20% improvement.  If there are still performance concerns open a Microsoft Support case and quote this blog.

Thanks to Hermann Daeubler for contributing to this blog.

 

Interesting SAP OSS Notes and Links

Windows 2008 R2 - Groups, Processors, Sockets, Cores Threads, NUMA nodes what is all this? - Microso...

GitHub - Mellanox/mlnx-tools: Mellanox userland tools and scripts

SAP HANA Hardware and Cloud Measurement Tools (HCMT) – Replacement of HWCCT Tool | SAP Blogs

How to Use the SAP HANA Hardware and Cloud Measurement Tools

2973899 - Running HCMT - HANA Hardware and Cloud Measurement Tools - SAP ONE Support Launchpad

SAP HANA Hardware and Cloud Measurement Tools | SAP Help Portal

SAP HANA Hardware and Cloud Measurement Analysis (ondemand.com)

Queues, RSS, interrupts and cores (mellanox.com)

Performance Configuration and Troubleshooting (mellanox.com)

What is IRQ Affinity? (mellanox.com)

500235 - Network Diagnosis with NIPING - SAP ONE Support Launchpad

How to test network throughput using iperf3 tool - Tutorials and How To - CloudCone

 

Short throughput/stability test                

Commands                      

Server: niping -s -I 0 (the last character is zero, not the letter O)

Client: niping -c -H <nipingsvr> -B 1000000 -L 100   

 

Measuring throughput 

Commands       

Server: niping -s -I 0 (the last character is zero, not the letter O)

Client: niping -c -H <nipingsvr> -B 100000

 

 

3rd party content in this blog is used under “fair use” copyright exception for the purpose of promoting scholarship, discussion, research, learning and education

 

 

 

 

 

 

 

 

2 Comments
Version history
Last update:
‎Nov 05 2023 01:38 PM
Updated by: