Blog Post

SQL Server Blog
13 MIN READ

Using SQL Server’s SNITrace to Troubleshoot Networking Issues

BobDorr's avatar
BobDorr
Icon for Microsoft rankMicrosoft
Feb 09, 2022
Moved from: bobsql.com

 

SNITrace, formally BIDTrace, is an deeper level of tracing for the Microsoft SQL Server TDS consumers and providers.

 

Terms and Brief History

Before I dive into the details here is a brief history and related terminology.

 

BID

Built-In Diagnostics (BID) is the former name replaced with SNI.  The original tracing was added as BIDTrace and since then many of the functions and macros were renamed to SNITrace* as improvements were made and features added.

SNI

Server Network Interface (SNI) is centralized code, shared among the SQL Server and SQL Server client providers, both Windows and Linux that can send, parse and act upon TDS communications.

ODS

Open Data Services (ODS) was a separate, public API set allowing a TDS gateway to be created.  The API was deprecated several releases ago and the behaviors consumed internally by SQL Server.   ODS provides an evening model for processing TDS requests as parsed by the SNI layer.


ODBC Trace

To capture an ODBC trace use the driver manager, trace capabilities (https://docs.microsoft.com/en-us/sql/odbc/admin/setting-tracing-options?view=sql-server-ver15)

 

1.       Launch odbcad32.exe

2.       Select the tracing tab and then start tracing

3.       Execute the reproduction

4.       Stop tracing and view the log file

 

Execution: sqlcmd -E -S.\sql17

ODBC trace records the ENTER and EXIT activities for the ODBC APIs as well as the various parameters, values and messages.

 

               740c-592c   ENTER SQLDriverConnectW

              HDBC                0x0000019DE23D8CC0

              HWND                0x0000000000000000

              WCHAR *             0x00007FFF581D21C0 [      -3] "******\ 0"

              SWORD                       -3

              WCHAR *             0x00007FFF581D21C0

              SWORD                       -3

              SWORD *             0x0000000000000000

              UWORD                        0 <SQL_DRIVER_NOPROMPT>

 

                740c-592c  EXIT  SQLDriverConnectW  with return code 1 (SQL_SUCCESS_WITH_INFO)

              HDBC                0x0000019DE23D8CC0

              HWND                0x0000000000000000

              WCHAR *             0x00007FFF581D21C0 [      -3] "******\ 0"

              SWORD                       -3

              WCHAR *             0x00007FFF581D21C0 <Invalid buffer length!> [-3]

              SWORD                       -3

              SWORD *             0x0000000000000000

              UWORD                        0 <SQL_DRIVER_NOPROMPT>

 

              DIAG [01000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Changed database context to 'master'. (5701)

              DIAG [01000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Changed language setting to us_english. (5703)

 

Going Beyond ODBC Trace

For my specific issue I was attempting to debug why the SQLDriverConnectW errored with TCP 10054 because it was failing the pre-login handshake.  For that I first needed to understand and capture the flow of the handshake activities and this is where SNITrace is helpful.

 

10054 The Direction Matters

Understanding where the RST (TCP closure) occurs if helpful in understanding the problem and how to troubleshoot it.

Each stage in the login process uses asynchronous read and write timeouts.  For example, when the SQL Server sends the initial Pre-Login Response the SQL Server posts a read request using a 5 second timeout.  If the first Ssl/Tls context packet takes longer than 5 seconds to arrive, from the client, SQL Server closes the TCP connection (RST occurs from the SQL Server) and the next write attempt from the SQL Client fails with a 10054 error, the connection has been reset.   The Ssl/Tls reads use similar timeout mechanisms and the SQL client provider honors the ‘login timeout.’  For example, if the login packet is sent by the client but the SQL Server does not respond before the established login timeout (30 second default) the client closes the TCP connection (RST from the SQL Client provider) and the server records the 10054, connection reset error and terminates the login attempt.

 

Before diving into SNITrace let’s breakdown the network trace of a successful and failed login.

 

Good Login Sequence  (SYN, ACK, … FIN)

 

10054 sequence from SQL Server | Start login at 48.8 and server terminates at 52.9 (SYN, ACK … RST)

 

SQL Server exchanges multiple packets to complete a login request.   The entire Tabular Data Stream protocol (TDS) is documented here: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/b46a581a-39de-4745-b076-ec4dbb7d13ec

A login request performs the following steps.

 

-          Open the TCP connection (usually to port 1433 on the SQL Server)

-          Send pre-login request from client

-          Receive pre-login response from server

-          Exchange Ssl/Tls information (InitializeSecurityContext/AcceptSecurityContext) by using TDS to exchange the security context information (1 to 4 packets)

-          Send login information (encrypted)

 

Login Annotated

The following table contains a high-level annotation of the SQL login and a select @@VERSION from sqlcmd.

 

 

No.

Source

Destination

Protocol

Length

Info

Client sends TCP open request to SQL Server (SYN) computer

9489

x.x.x.1

x.x.x.100

TCP

66

56369  >  1433 [SYN] Seq=0 Win=65280 Len=0 MSS=1360 WS=256 SACK_PERM=1

TCP acknowledges request (If port was incorrect this is where server sends RST 10054)

9490

x.x.x.100

x.x.x.1

TCP

66

1433  >  56369 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1396 WS=256 SACK_PERM=1

 

9491

x.x.x.1

x.x.x.100

TCP

54

56369  >  1433 [ACK] Seq=1 Ack=1 Win=262400 Len=0

SQL Client sends TDS pre-login

9492

x.x.x.1

x.x.x.100

TDS

142

TDS7 pre-login message

SQL Server responds to pre-login request

9493

x.x.x.100

x.x.x.1

TDS

102

Response

 

9494

x.x.x.1

x.x.x.100

TDS

250

TDS7 pre-login message

 

9495

x.x.x.100

x.x.x.1

TCP

1414

1433  >  56369 [ACK] Seq=49 Ack=285 Win=262144 Len=1360 [TCP segment of a reassembled PDU]

 

9496

x.x.x.100

x.x.x.1

TCP

1414

1433  >  56369 [ACK] Seq=1409 Ack=285 Win=262144 Len=1360 [TCP segment of a reassembled PDU]

 

9497

x.x.x.1

x.x.x.100

TCP

54

56369  >  1433 [ACK] Seq=285 Ack=2769 Win=262400 Len=0

 

9498

x.x.x.100

x.x.x.1

TCP

1414

1433  >  56369 [ACK] Seq=2769 Ack=285 Win=262144 Len=1360 [TCP segment of a reassembled PDU]

Ssl/Tls negotiation packet exchange (SQL Client InitializeSecurityContext -> SQL Server AcceptSecurityContext)

9499

x.x.x.100

x.x.x.1

TDS

70

TDS7 pre-login message (Not last buffer)

 

9500

x.x.x.1

x.x.x.100

TCP

54

56369  >  1433 [ACK] Seq=285 Ack=4145 Win=262400 Len=0

Note: The SQL Server can send a RST for 5 second timeouts in this exchange window.

9501

x.x.x.100

x.x.x.1

TDS

428

TDS7 pre-login message

 

9502

x.x.x.1

x.x.x.100

TDS

188

TDS7 pre-login message

 

9503

x.x.x.100

x.x.x.1

TDS

113

TDS7 pre-login message

End of Ssl/Tls negotiation process

9504

x.x.x.1

x.x.x.100

TDS

349

TLS exchange

 

9505

x.x.x.100

x.x.x.1

TCP

54

1433  >  56369 [ACK] Seq=4578 Ack=714 Win=261632 Len=0

SQL Client sends Login information (SendLogin)

9506

x.x.x.100

x.x.x.1

TDS

800

TLS exchange

 

9507

x.x.x.1

x.x.x.100

TDS

163

TLS exchange

Note: The client can send a RST to honor the login timeout during this window

9509

x.x.x.100

x.x.x.1

TDS

125

TLS exchange

 

9510

x.x.x.1

x.x.x.100

TDS

147

TLS exchange

 

9511

x.x.x.100

x.x.x.1

TDS

120

TLS exchange

SQL Server finishes acknowledgement of successful login after sending back config info (language, …)

9512

x.x.x.1

x.x.x.100

TCP

54

56369  >  1433 [ACK] Seq=916 Ack=5461 Win=261120 Len=0

Idle keep alive, client not doing anything (sqlcmd at prompt)

9627

x.x.x.1

x.x.x.100

TCP

55

[TCP Keep-Alive] 56369  >  1433 [ACK] Seq=915 Ack=5461 Win=261120 Len=1

 

9628

x.x.x.100

x.x.x.1

TCP

66

[TCP Keep-Alive ACK] 1433  >  56369 [ACK] Seq=5461 Ack=916 Win=261632 Len=0 SLE=915 SRE=916

 

9629

x.x.x.100

x.x.x.1

TCP

55

[TCP Keep-Alive] 1433  >  56369 [ACK] Seq=5460 Ack=916 Win=261632 Len=1

 

9630

x.x.x.1

x.x.x.100

TCP

66

[TCP Keep-Alive ACK] 56369  >  1433 [ACK] Seq=916 Ack=5461 Win=261120 Len=0 SLE=5460 SRE=5461

SQL Client sends query (select @@VERSION)

9741

x.x.x.1

x.x.x.100

TDS

149

TLS exchange

SQL Server responds with version information

9743

x.x.x.100

x.x.x.1

TDS

337

TLS exchange

 

9744

x.x.x.1

x.x.x.100

TCP

54

56369  >  1433 [ACK] Seq=1011 Ack=5744 Win=262400 Len=0

SQL Client sends disconnect request and closes TCP connection

9772

x.x.x.1

x.x.x.100

TCP

54

56369  >  1433 [FIN, ACK] Seq=1011 Ack=5744 Win=262400 Len=0

 

9773

x.x.x.100

x.x.x.1

TCP

54

1433  >  56369 [ACK] Seq=5744 Ack=1012 Win=261376 Len=0

 

9774

x.x.x.100

x.x.x.1

TCP

54

1433  >  56369 [FIN, ACK] Seq=5744 Ack=1012 Win=261376 Len=0

Server acknowledges TCP closure

9775

x.x.x.1

x.x.x.100

TCP

54

56369  >  1433 [ACK] Seq=1012 Ack=5745 Win=262400 Len=0

 

SNI Tracing

Understating where the TCP Close (RST) originates provides you with diagnostic guidance.   A general rule for login is if the SQL Server sends the RST it is probably a client issue and if the SQL client provider sends the RST is it probably a SQL Server issue. 

 

Scenario (SQL Server Side): Client sent RST due to login timeout

Start by tracing the SQL error_reported and/or connectivity_ring_buffer events on the SQL Server.   The events and 17803 error highlight the various login steps occurring and provides timing information to help you understand the failure condition. My previous post highlights the login timer and 17803 event in detail: https://bobsql.com/sql-mysteries-sql-server-login-timeouts-a-debugging-story/

Assume the information reveals the login attempt is taking longer than expected and we want to know more about internal activities.   We can accomplish that using the SQL Server side SNI Tracing, exposed with a dynamic XEvent session. 

 

Note: The BID/SNI Trace for the SQL Server (SQL 2008’ish days) used to require ETW setup and a restart of SQL Server.   Newer versions of SQL Server do not require ETW or restart of SQL Server, instead the SNI trace events are produced to the XEvent session dynamically.  You can start and stop the session, apply filters and apply predicates dynamically.

 

The name of the XEvent is not obvious (I am working with the XEvent team to correct this.)  The event is named ‘trace’ and here is a script to create an SQL SNI tracing session.

 

CREATE EVENT SESSION [SNIRdorr] ON SERVER

ADD EVENT sqlsni.trace(

    ACTION(package0.event_sequence,sqlserver.client_connection_id,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.nt_username,sqlserver.session_id))

       WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0                          KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF)

 

I started the XEvent trace on my SQL Server and established a connection using sqlcmd.   Looking at task manager the process id for sqlcmd was (PID = 5188.)  I then stopped the XEvent capture and applied the XEvent filter to the client_pid column in the SQL Server Management Studio.  The SNI Trace to XEvent logs internal SNI function calls and can be combined with the client SNI Trace and network trace making the scenario easier to understand.  

 

ENTER: Highlighted in the screen capture below is the SNIPacketAllocateEx2, a memory allocation.  In fact, the SQL Server allocates 2 input packets and 1 output packet for each connection, keeping free packets in an object pool, partitioned by packet size. 

 

INFO: Looking closer the highlighted event is the start of the allocation attempt and the next event indicates the packet was allocated from the SNI object pool.  You can use a DMV query to see the SNI pools for SNI Packets (select * from sys.dm_os_memory_pools where type = 'OBJECTSTORE_SNI_PACKET'). Each pool accommodating a different packet size range.  You can also use sqlcmd’s -a parameter and the free_entries_count column of the DMV changes for different packet size and pool usage scenarios.

 

LEAVE: The 3rd SNIPacketAllocateEx2 is the return or exit from SNIPacketAllocateEx2

 

For the sake of this example assume that the elapsed duration from the ENTER to the LEAVE event took more than 5 seconds.  From that you might conclude it is taking a long time to allocate a packet and troubleshoot the memory related issue.  

 

There are other SNI events besides the ‘trace’ event.   The following query outlines the events contained in the sqlsni package which can provide you with additional debugging capabilities.

 

select o.name, o.description from sys.dm_xe_objects o

       inner join sys.dm_xe_packages p on p.guid = o.package_guid

       and o.object_type = 'event'

       and p.name = 'sqlsni'

 

 

 

Using the following XEvent session, reveals the packet allocations occur as part of SNISetInfo activity.

 

CREATE EVENT SESSION [SNIRdorr] ON SERVER

ADD EVENT sqlsni.trace( ACTION(package0.event_sequence,sqlserver.client_connection_id,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.nt_username,sqlserver.session_id)),

ADD EVENT sqlsni.enter( ACTION(package0.event_sequence,sqlserver.client_connection_id,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.nt_username,sqlserver.session_id)),

ADD EVENT sqlsni.leave(    ACTION(package0.event_sequence,sqlserver.client_connection_id,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.nt_username,sqlserver.session_id))

WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF)

 

 

Scenario (SQL client side): Server sent RST due to pre-login read timeout

The SQL client is not an XEvent session, instead SNITrace is implemented using ETW tracing for the client providers (SQL ODBC, SQL OLEDB and SQLClient.)  For the SQL Server client providers such as sqlncli11.dll, msodbcsql17.dll, msoledbsql.dll, etc. ETW registration and enablement is required.

 

ETW tracing uses a major guid and event guids to capture and store the data.   The major guid represents the component (such as a major release of a SQL Server) and each event has a guid with an associated format string defined.  ETW uses the guids in much the same way as the Windows event log uses the message files (.mc) format strings (resource dlls.)   For optimized tracing formatting should always be done offline and not while producing the event.  The offline formatting approach saves storage space, reduces CPU and the impact on performance.   When an event is produced the ETW consumer checks to see if that even has been enabled for tracing, if so serializes the binary data values.   The ETW reporting features are used to process the capture and format into ‘human’ readable format.

 

During the build process the SQL client providers produce the major guid along with the format string and event guids.  These guids are major version aligned.   Unless you have a private symbol file (.pdb) that contains the ETW information or a ETW output file (Trace Message Format File .tmf) you do not know what register with ETW or use to enable the tracing.  The good news is the information is discoverable.

 

There are some older articles that discuss SNI client tracing and provide some of the older scripts and .MOF registration files.  (I have filed an issue to have the doc team update these.) You will still find these helpful.

 

https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008/cc765421(v=sql.100)?redirectedfrom=MSDN
https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2012/hh880086(v=msdn.10)?redirectedfrom=MSDN

 

What Providers Are Installed

The SNI client tracing is controlled by logman.exe command.  Start by determining what is registered on your system.

 

logman query providers
logman query providers | findstr /I sql


On my system I have the Microsoft ODBC Driver for SQL Server registered (msodbcsql13.dll.)

 

Finding the Trace GUID to Enable

When the ODBC driver loads the SQL client provider the provider registers with ETW (EtwRegister.)  At this time the provider determines if/what events should be produced and therefore you usually have to restart the application to enable the trace activities.

 

The MSDADIAG.ETW trace is how you discover the guid of the SQL client providers.

 

{8B98D3F2-3CC6-0B9C-6651-9649CCE5C752}  0xFFFFFFFF  0   MSDADIAG.ETW


Make sure MSDADIAG.ETW is a registered provider:  logman query providers | findstr /I MSDADIAG.ETW If not present use the registration information from the older package links above to execute the mofcomp for the MSDADIAG.ETW trace provider.

Once MSDADIAG.ETW is registered, enable MSDADIAG.ETW tracing.   (Hint: The older links contain the _registerSchema.cmd, _startTrace.cmd, _stopTrace.cmd and _report.cmd to assist you.)

 

logman start MyTrace -pf traces.txt -ct perf -o Out.etl -ets


Traces.txt
should contain the following content

 

{8B98D3F2-3CC6-0B9C-6651-9649CCE5C752}  0xFFFFFFFF  0   MSDADIAG.ETW


The 0xFFFFFFFF enables all event groups tracing for the MSDADIAG.ETW tracing provider. 

Run your application and then stop the tracing.

 

logman stop MyTrace -ets


Now you can convert the out.etl to a CSV file format.

 

traceRPT /y Out.etl /of CSV


The output shows some of the common files and associated guids.

 

For this example, I am looking for msodbcsql13.dll and there you have it.  The MSDADIAG.ETW shows the initial registration attempt from the Microsoft SQL Server ODBC provider with the major GUID you need to enable tracing of the provider.

 

Registering the SQL client SNI Trace Provider

Now you create a .MOF file for the msodbcsql provider registration.  You can take one of the older .MOF files from the links above, modify the description and guid to look like the following, and then use mofcomp to complete registration.

 

Register: mofcomp msodbcsql.mof

#pragma autorecover

#pragma classflags("createonly")

#pragma namespace ("\\\\.\\Root\\WMI")

#pragma deleteclass("Bid2Etw_MSODBCSQL_1", NOFAIL)

/////////////////////////////////////////////////////////////////////////////

//

//  Microsoft ODBC Driver for SQL Server

 

[

 dynamic: ToInstance,

 Description("Microsoft ODBC Driver for SQL Server"),

 Guid("{06DEBC35-76AB-2719-2ABE-E7E57F1DAD2C}"),

 locale("MS\\0x409")

]

class Bid2Etw_MSODBCSQL_1 : EventTrace

{

};

 

[

 dynamic: ToInstance,

 Description("Microsoft ODBC Driver for SQL Server"),

 Guid("{06DEBC35-76AB-2719-2ABE-E7E57F1DAD2C}"),

 locale("MS\\0x409")

]

class Bid2Etw_MSODBCSQL_1_Trace : Bid2Etw_MSODBCSQL_1

{

};

 

[

 dynamic: ToInstance,

 Description("Microsoft ODBC Driver for SQL Server formatted output (A)"),

 EventType(17),

 EventTypeName("TextA"),

 locale("MS\\0x409")

]

class Bid2Etw_MSODBCSQL_1_Trace_TextA : Bid2Etw_MSODBCSQL_1_Trace

{

    [

     WmiDataId(1),

     Description("Module ID"),

     read

    ]

    uint32 ModID;

 

    [

     WmiDataId(2),

     Description("Text StringA"),

     extension("RString"),

     read

    ]

    object msgStr;

};

 

[

 dynamic: ToInstance,

 Description("Microsoft ODBC Driver for SQL Server formatted output (W)"),

 EventType(18),

 EventTypeName("TextW"),

 locale("MS\\0x409")

]

class Bid2Etw_MSODBCSQL_1_Trace_TextW : Bid2Etw_MSODBCSQL_1_Trace

{

    [

     WmiDataId(1),

     Description("Module ID"),

     read

    ]

    uint32 ModID;

 

    [

     WmiDataId(2),

     Description("Text StringW"),

     extension("RWString"),

     read

    ]

    object msgStr;

};


You can verify the registration outcome using a logman query.

 

logman query providers | findstr /I sql


Enabling SNI Tracing (Start, Stop and Report)

Add the msodbcsql provider to the traces.txt file allowing you to start, stop and report with the msodbcsql events.

 

Traces.txt should contain the following content.

 

{8B98D3F2-3CC6-0B9C-6651-9649CCE5C752}  0xFFFFFFFF  0   MSDADIAG.ETW
{06DEBC35-76AB-2719-2ABE-E7E57F1DAD2C}   0xFFFFFFFF  0   MSODBCSQL13


Here is an example capture of sqlcmd (client side) showing SNIPacketAllocateEx activities.

 

SNI Trace Group Masks

You can control the event groups traced by modifiing the mask in the traces.txt file to one or more of the following values.

 

Default

0x1

Common Trace

0x2

Scope

0x4

Performance

0x8

Increment, Decrement, Set and Reset

0x10

Memory

0x20

Error and HResults

0x40

Advanced

0x80

Reservered 1

0x100

Reserveed  2

0x200

Reservered 3

0x400

Bid

0xFFF

User

0xFFFFF000

All

0xFFFFFFFF


SNI Trace on Linux

SNI Trace is also available for native Linux applications.   Instead of using ETW the output is written to a text file, registered in ODBCINST.INI. Here is an example:

 

["ODBC Driver 17 for SQL Server"]

BIDTrace=yes
BIDTraceFile=<<File path>>
BIDTraceFileSize=100240


Note:
A value of zero(0) for the file size indicates no size limit.  Recommendation: Set the file size and allow rollover to provide you with smaller files to troubleshoot.


Note: File size is in bytes and limited to 4GB per file.

 

Recap

The SQL Server error messages, ODBC Trace, XEvents, network tracing and SNI Trace provide lots of details about your application and the SQL Server TDS network activities.  While it takes a bit more work to register for SNI tracing, the first time on the client, the SNI Trace is powerful and easy to start, stop and report once registered.

 

The tracing facilities can help you resolve networking issues impacting your SQL Server.

 

 

 

 

Updated Feb 09, 2022
Version 1.0
No CommentsBe the first to comment