The SAP HCMT Network Topology Test, niping and/or iperf3 results are below the expected values in a VM hosted on Azure M-series under some circumstances.
The SAP HCMT output could show errors such as those in the screenshot below.
The error below shows a 5-node scale-out SAP HANA landscape. The problem is most often seen on SAP HANA scale-out systems, but may occur on large Oracle or DB2 systems.
The network optimization described in this blog may benefit Azure NetApp Files (ANF) scenarios as well, such as a large Oracle server running dNFS to ANF.
It is recommended readers be fully familiar with the terminology such as NUMA Node, Processor, Core, Logical Processor and Hyperthreading before continuing to read this blog. These terms are explained here
Network processing consumes a significant amount of CPU. Modern Network Cards and Network Drivers leverage technologies to distribute and balance CPU consumption across multiple CPU cores.
Mellanox provides a set of tools that can “pin” network processing operations to a specific set of CPU cores or NUMA Nodes. The links at the end of this blog describe the process in more detail.
Microsoft internal testing has shown that limiting network processing queues to 16 and pinning network processing to NUMA node 0 as seen from the Linux guest OS in the VM is a configuration that delivers stable high throughput.
Excluding the first CPU core, core 0 (logical processor 0 and 1) is recommended as many other operations are affinitized to this core.
The procedure detailed in this blog has been implemented many times successfully, but careful testing should be done prior to implementing (to establish a baseline) and after implementation.
The procedure below should not be implemented on a production system without careful testing and validation in non-production systems. The procedure in this blog only applies to Azure M-series (M, Mv2 and similiar).
Test scenarios have shown network throughput improvements of approximately 20%.
Most Linux tools such as ‘vmstat’, show averages and are not useful for characterizing network interrupt utilization. It is recommended to install ‘nmon’ and sysstat(SAR).
If sysstat needs to be installed follow the steps below
# sudo yum install sysstat
# sudo service sysstat restart
Redirecting to /bin/systemctl restart sysstat.service
The /var/log/sa/sarXX files can be copied onto a Windows PC with sftp
sftp -i <keyfilename>.pem azureuser@<xx.xx.xx.xx>
get /var/log/sa/sar<XX>
Run "Java -jar C:\sap_media\ksar.jar" and capture graphs showing overall CPU consumption, then consumption on the first NUMA node (node 0) and Kernel/System CPU consumption on each Logical Processor.
An optimal pattern in KSAR will show approximately the same CPU consumption on all NUMA Nodes and Logical Processors. NUMA Node 0 and Logical Processors 0 and 1 may have slightly higher CPU time, especially System CPU time (kernel time).
CPU consumption over 80% on some NUMA Nodes and/or Logical Processors and simultaneously near 0-10% consumption on other NUMA Nodes and/or Logical Processors indicates a problem.
Nmon can be used to watch cpu consumption real-time and can help isolate high CPU processes.
Follow this procedure to resolve the problem. This procedure “pins” the interrupt processing onto the first NUMA Node (Node 0). The first cpu core (logical processors 0 and 1) are excluded as these logical processors may be heavily used by other Operating System components
1. Download Mellanox tools using WGET:
wget https://github.com/Mellanox/mlnx-tools/archive/refs/tags/v5.1.3.zip
Unzip the file. The two scripts used in this blog are:
a. set_irq_affinity_cpulist.sh
b. show_irq_affinity.sh
2. Turn off the irqbalance service :
systemctl stop irqbalance
systemctl disable irqbalance
and then double-check that it’s really inactive :
systemctl status irqbalance
Run the command “ip link ls” to figure out the interface names for Accelerated Networking. In this example there are three vNICs but Accelerated Networking is only turned on for two of them. Accelerated Networking interfaces are identified by the key word “SLAVE”.
Moving interrupts with the Mellanox tools is a concept that only applies to Accelerated Networking interfaces.4. Run the command “lscpu | grep NUMA” to find the range of vCPU numbers for virtual numa node 0. In this example it is from vCPU 0 to vCPU 51 (52 logical processors running on 26 cores with Hyperthreading enabled)
Note: depending on the workload reducing the number of queues via the command “ethtool” may improve performance. This command will reduce the number of queues for both interfaces to 16:
./ethtool -L <Accelerated Networking interface> combined 16 and then place the interrupts on separate vCPUs on virtual numa node 0
The command above is not permanent. To make this setting permanent follow this procedure.
Prior to implementing this change display the current number of queues with the command ethtool -l eth0
SUSE
/etc/sysconfig/network/ifcfg-eth0
Add the line: ETHTOOL_OPTIONS=’-L eth0 combined 16’
How to configure network parameters for AutoYaST network installation | Support | SUSE
Red Hat
/etc/sysconfig/network-scripts/ifcfg-eth0
On RedHat 8 or above the following procedure can be used to permanently reduce the queues to 16.
Step #1. Create a file in /etc/NetworkManager/dispatcher.d/set_combined_queues
Step #2. Enter a comment to document the script file (such as the link to this blog)
Step #3. Enter the lines:
#!/bin/sh
/sbin/ethtool -L eth0 combined 16
Step #4. Save the file and exit
Step #5. Change the mods on the file to allow execution - suggested is chmod 751
Step #6. Restart the Red Hat OS and run ethtool -l eth0 and confirm the queues are correctly configured
Note: There are differences in syntax and quotes between SUSE and Red Hat. Substitute eth0 for the active Accelerated Networking interface.
7. Use the Mellanox script “set_irq_affinity_cpulist.sh” to modify the vCPUs on which the interrupts of a specific Accelerated Networking interface are running. This command can “pin” or move them all to virtual NUMA node 0. As the guest OS is usually using Logical Processor 0 and 1 for specific tasks it is recommended to let the interrupts start with Logical Processor 2 (this is the first logical processor on core 1, which is the second core on the processor).
./set_irq_affinity_cpulist.sh 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33 eth3
./show_irq_affinity.sh eth3 show_cpu_number (note: there are underscores in this command)
Repeat SAP HCMT and/or niping or iperf3.
In addition, it is recommended to repeat analysis with KSAR and Nmon.
In most cases SAP HCMT and other tools will show approximately 20% improvement. If there are still performance concerns open a Microsoft Support case and quote this blog.
Thanks to Hermann Daeubler for contributing to this blog.
GitHub - Mellanox/mlnx-tools: Mellanox userland tools and scripts
SAP HANA Hardware and Cloud Measurement Tools (HCMT) – Replacement of HWCCT Tool | SAP Blogs
How to Use the SAP HANA Hardware and Cloud Measurement Tools
2973899 - Running HCMT - HANA Hardware and Cloud Measurement Tools - SAP ONE Support Launchpad
SAP HANA Hardware and Cloud Measurement Tools | SAP Help Portal
SAP HANA Hardware and Cloud Measurement Analysis (ondemand.com)
Queues, RSS, interrupts and cores (mellanox.com)
Performance Configuration and Troubleshooting (mellanox.com)
What is IRQ Affinity? (mellanox.com)
500235 - Network Diagnosis with NIPING - SAP ONE Support Launchpad
How to test network throughput using iperf3 tool - Tutorials and How To - CloudCone
Short throughput/stability test
Commands
Server: niping -s -I 0 (the last character is zero, not the letter O)
Client: niping -c -H <nipingsvr> -B 1000000 -L 100
Measuring throughput
Commands
Server: niping -s -I 0 (the last character is zero, not the letter O)
Client: niping -c -H <nipingsvr> -B 100000
3rd party content in this blog is used under “fair use” copyright exception for the purpose of promoting scholarship, discussion, research, learning and education
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.