The HANGMAN log file, created with the HANGMAN tool described in SAP note 948633 gives us a better understanding about the root cause of a system hang or hiccup of an SAP system. The log file consists of several sections: SAP process list per instance, Windows process list per server, SQL process list, database locks and some additional information (cache sizes and hit ratio, scheduler info, I/O response time per file).
In the following examples we only will have a look at the most interesting columns of each list. In real life, the lists contain much more information in additional columns. You can see this in the attached hangman log file HANGMAN__2008-10-07__06-15-37__650.log. This log file was created on a test system (SAP 7.00 ABAP + JAVA stack, SQL Server 2008) while some external JAVA programs executed blocking and blocked SQL statements. The intention of this was to get a more interesting log file containing (indirectly) blocked statements. Normally you will not see external programs accessing the tables of an SAP database.
The most common SAP modules are based on an ABAP stack. You also may use HANGMAN for a pure JAVA stack, for example for an SAP Enterprise Portal system. However, in this case you will not see any (JAVA) work process list. In the following “SAP work process” is always related to ABAP. Let’s have a closer look on the main sections of a HANGMAN log file:
SAP instance list
This list gives you an overview of the whole SAP system. The output is similar to SAP transaction SM51. You can see the hostname, SAP instance number and installed services for each running SAP instance. Since this information is retrieved from the SAP message server a stopped SAP instance is not listed here.
CLIENT-NAME HOST SERVICE SERVICES (SAP)
--------------------------------- ---------------------
SAPAPP1_SR3_00 sapapp1 sapdp00 DIA UPD BTC UP2
SAPAPP2_SR3_02 sapapp2 sapdp02 DIA BTC
SAPCI_SR3_00 sapci sapdp00 DIA UPD ENQ BTC SPO ICM
SAP work process list per instance
The columns of the SAP work process list are separated by tabs. Therefore the tool used to open the HANGMAN log file affects the layout. All other lists use spaces to separate the columns. The SAP work process list is collected remotely using the COM interface of SAP kernel 4.5B and newer. Since this is collected by of the dispatcher process of a particular instance, a free SAP dialog process is not needed to retrieve the data. The output looks similar to the output of SAP transaction SM50:
No Typ Pid Status Reason Sem Time Program Client User Action Table
0 DIA 388 Wait
1 BTC 4624 Run 472 SAPLIBIN 100 MIKE Direct Read ZSHIP
2 BTC 5028 Stop RFC 4212 RSCOLL00 000 DDIC
3 BTC 2252 Stop RFC 1271 SAPLSALC 100 DDIC
4 BTC 264 Stop ARFC 2471 000 SAM
5 UP2 2400 Wait
This is typically the starting point of any analysis in an ABAP system, because you can see here the state of the system from an SAP point of view. You can figure out the name of an SAP user, which is performing a database select. “Pid” is the Windows process ID running the ABAP work process.
A work process containing the value “Direct Read” or “Sequential Read” in column “Action” is waiting on a database SELECT. “Sequential Read” simply means that an ABAP “select single” is executed, which results in a “select top 1” on SQL Server. This has nothing to do with the execution plan used by SQL Server to run the SELECT. Customers are often confused about this and assume that SQL Server is performing a full table scan once they see long rung sequential reads in SM50.
SAP work process queue statistics
The HANGMAN log file contains the SAP work process list of each SAP instance twice in order to get a better idea what is going on in the system. Between the two snapshots of the work process list you can see the queue statistics for each type of work process.
Typ Now High Max
NOWP 0 3 2000
DIA 0 4 2000
UPD 0 1 2000
ENQ 0 0 2000
BTC 0 1 2000
SPO 0 0 2000
UP2 0 0 2000
The most interesting types are DIA (dialog), BTC (batch) and UPD (update). “Now” tells you the number of requests waiting on a free SAP work process at this moment. Any value greater zero means that all work processes of the according type are in use. “High” is the highest number which occurred since the SAP instance started. “Max” is the configured queue length, which is typically large enough.
Windows process list
The Windows process list is collected for each ABAP application server and the database server. Running two SAP instances and SQL Server on the same server will result in one single Windows process list (not in three).
Proc DeltaUsr DeltaKrn
Id ProcessTree Thread WorkingSet Time Time ProcessPath
0 System Idle Process 8 28672 0 5140
4 System 72 36864 0 15
744 smss.exe 2 274432 0 0
984 csrss.exe 16 4546560 0 0 C:\WINDOWS\s
1200 winlogon.exe 18 4390912 0 0 C:\WINDOWS\s
1340 services.exe 19 2314240 0 62 C:\WINDOWS\s
1832 SAPOSCOL.EXE 4 15220736 0 0 F:\usr\sap\P
2064 SAPSTARTSRV.EXE 9 22855680 15 0 F:\usr\sap\P
3456 disp+work.exe 3 16351232 0 0 F:\usr\sap\P
292 gwrd.exe 4 5828608 0 0 F:\usr\sap\P
380 icman.exe 54 9572352 0 0 F:\usr\sap\P
388 disp+work.exe 5 363868160 0 15 F:\usr\sap\P
4624 disp+work.exe 5 363728896 750 15 F:\usr\sap\P
1556 disp+work.exe 5 196173824 0 0 F:\usr\sap\P
HANGMAN uses WMI to get Windows process information twice within a time frame. It then calculates a delta of user/kernel time, physical reads/writes and page faults. In the example above you can see that the SAP work process with windows process id 4624 consumed 750ms CPU time in user mode and 15ms in kernel mode during the time frame. The length of the time frame is not fixed and depends on the speed and workload of the whole SAP system. Means it is dependent on the execution time of the HANGMAN script. You can calculate the time frame by adding all user and kernel times of all processes, including the system idle process (and dividing the result by the number of CPU cores). In the example the time frame (delta time) is about 6000ms.
By comparing the process ID (here: 4624) of the Windows process list with the SAP process list, you can see that it was SAP work process ID 1 (a batch job) which consumed the 750ms CPU time in user mode.
The Windows process list of the database server contains the process SQLSERVR.EXE and its total number of threads. This includes system and worker threads. Seeing a huge number of threads here while the SQL process list is missing in the same HANGMAN log file is an indication for SQL Server running out of worker threads. In this case you should restart HANGMAN with the DAC (dedicated admin connection) parameter.
The next time we will have a further look on the SQL Server specific sections of a HANGMAN log file.
https://msdnshared.blob.core.windows.net/media/MSDNBlogsFS/prod.evol.blogs.msdn.com/CommunityServer.Components.PostAttachments/00/08/99/13/85/HANGMAN__2008-10-07__06-15-37__650.log