Analyzing a HANGMAN log file, part 1

Published Mar 13 2019 08:24 AM 153 Views
First published on MSDN on Oct 24, 2008

The HANGMAN log file, created with the HANGMAN tool described in SAP note 948633 gives us a better understanding about the root cause of a system hang or hiccup of an SAP system. The log file consists of several sections: SAP process list per instance, Windows process list per server, SQL process list, database locks and some additional information (cache sizes and hit ratio, scheduler info, I/O response time per file).

In the following examples we only will have a look at the most interesting columns of each list. In real life, the lists contain much more information in additional columns. You can see this in the attached hangman log file HANGMAN__2008-10-07__06-15-37__650.log. This log file was created on a test system (SAP 7.00 ABAP + JAVA stack, SQL Server 2008) while some external JAVA programs executed blocking and blocked SQL statements. The intention of this was to get a more interesting log file containing (indirectly) blocked statements. Normally you will not see external programs accessing the tables of an SAP database.

The most common SAP modules are based on an ABAP stack. You also may use HANGMAN for a pure JAVA stack, for example for an SAP Enterprise Portal system. However, in this case you will not see any (JAVA) work process list. In the following “SAP work process” is always related to ABAP. Let’s have a closer look on the main sections of a HANGMAN log file:

SAP instance list
This list gives you an overview of the whole SAP system. The output is similar to SAP transaction SM51. You can see the hostname, SAP instance number and installed services for each running SAP instance. Since this information is retrieved from the SAP message server a stopped SAP instance is not listed here.

--------------------------------- ---------------------
SAPAPP1_SR3_00  sapapp1  sapdp00  DIA UPD BTC UP2
SAPAPP2_SR3_02  sapapp2  sapdp02  DIA BTC
SAPCI_SR3_00    sapci    sapdp00  DIA UPD ENQ BTC SPO ICM

SAP work process list per instance
The columns of the SAP work process list are separated by tabs. Therefore the tool used to open the HANGMAN log file affects the layout. All other lists use spaces to separate the columns. The SAP work process list is collected remotely using the COM interface of SAP kernel 4.5B and newer. Since this is collected by of the dispatcher process of a particular instance, a free SAP dialog process is not needed to retrieve the data. The output looks similar to the output of SAP transaction SM50:

No Typ  Pid  Status Reason Sem Time Program  Client User Action      Table
0  DIA  388  Wait
1  BTC  4624 Run               472  SAPLIBIN 100    MIKE Direct Read ZSHIP
2  BTC  5028 Stop   RFC        4212 RSCOLL00 000    DDIC
3  BTC  2252 Stop   RFC        1271 SAPLSALC 100    DDIC
4  BTC  264  Stop   ARFC       2471          000    SAM
5  UP2  2400 Wait

This is typically the starting point of any analysis in an ABAP system, because you can see here the state of the system from an SAP point of view. You can figure out the name of an SAP user, which is performing a database select. “Pid” is the Windows process ID running the ABAP work process.
A work process containing the value “Direct Read” or “Sequential Read” in column “Action” is waiting on a database SELECT. “Sequential Read” simply means that an ABAP “select single” is executed, which results in a “select top 1” on SQL Server. This has nothing to do with the execution plan used by SQL Server to run the SELECT. Customers are often confused about this and assume that SQL Server is performing a full table scan once they see long rung sequential reads in SM50.

SAP work process queue statistics
The HANGMAN log file contains the SAP work process list of each SAP instance twice in order to get a better idea what is going on in the system. Between the two snapshots of the work process list you can see the queue statistics for each type of work process.

Typ  Now  High Max
NOWP 0    3    2000
DIA  0    4    2000
UPD  0    1    2000
ENQ  0    0    2000
BTC  0    1    2000
SPO  0    0    2000
UP2  0    0    2000

The most interesting types are DIA (dialog), BTC (batch) and UPD (update). “Now” tells you the number of requests waiting on a free SAP work process at this moment. Any value greater zero means that all work processes of the according type are in use. “High” is the highest number which occurred since the SAP instance started. “Max” is the configured queue length, which is typically large enough.

Windows process list
The Windows process list is collected for each ABAP application server and the database server. Running two SAP instances and SQL Server on the same server will result in one single Windows process list (not in three).

Proc                                       DeltaUsr DeltaKrn
Id ProcessTree            Thread WorkingSet  Time Time   ProcessPath
0 System Idle Process         8      28672     0 5140
4 System                     72      36864     0   15
744   smss.exe                  2     274432     0    0
984     csrss.exe              16    4546560     0    0  C:\WINDOWS\s
1200     winlogon.exe           18    4390912     0    0  C:\WINDOWS\s
1340       services.exe         19    2314240     0   62  C:\WINDOWS\s
1832         SAPOSCOL.EXE        4   15220736     0    0  F:\usr\sap\P
2064         SAPSTARTSRV.EXE     9   22855680    15    0  F:\usr\sap\P
3456           disp+work.exe     3   16351232     0    0  F:\usr\sap\P
292             gwrd.exe        4    5828608     0    0  F:\usr\sap\P
380             icman.exe      54    9572352     0    0  F:\usr\sap\P
388             disp+work.exe   5  363868160     0   15  F:\usr\sap\P
4624             disp+work.exe   5  363728896   750   15  F:\usr\sap\P
1556             disp+work.exe   5  196173824     0    0  F:\usr\sap\P

HANGMAN uses WMI to get Windows process information twice within a time frame. It then calculates a delta of user/kernel time, physical reads/writes and page faults. In the example above you can see that the SAP work process with windows process id 4624 consumed 750ms CPU time in user mode and 15ms in kernel mode during the time frame. The length of the time frame is not fixed and depends on the speed and workload of the whole SAP system. Means it is dependent on the execution time of the HANGMAN script. You can calculate the time frame by adding all user and kernel times of all processes, including the system  idle process (and dividing the result by the number of CPU cores). In the example the time frame (delta time) is about 6000ms.
By comparing the process ID (here: 4624) of the Windows process list with the SAP process list, you can see that it was SAP work process ID 1 (a batch job) which consumed the 750ms CPU time in user mode.
The Windows process list of the database server contains the process SQLSERVR.EXE and its total number of threads. This includes system and worker threads. Seeing a huge number of threads here while the SQL process list is missing in the same HANGMAN log file is an indication for SQL Server running out of worker threads. In this case you should restart HANGMAN with the DAC (dedicated admin connection) parameter.

The next time we will have a further look on the SQL Server specific sections of a HANGMAN log file.


Version history
Last update:
‎Mar 13 2019 08:24 AM
Updated by: