I was reviewing a test run this week that had more SQL 17830 (TCP 10054 / 0x2746) errors than I could explain from just kills or login timeouts.
I started looking at the output in the connectivity ring buffer and the matching XEvent, error_reported event where error_number = 17830. I thought I would find my answer among the login timing information (Reference: https://blogs.msdn.microsoft.com/bobsql/2019/02/10/sql-mysteries-sql-server-login-timeouts-a-debuggi....) Unfortunately, the timings were always small and not something the SQL Server engine was doing, instead the client was closing the connection. In fact, I was able to break the reproduction down to a single connection using an idle SQL Server so there was no impact from other workloads.
Perhaps the problem was a lag on the client or TCP layer?
I was expecting to see the client start the connection by sending the TCP SYN, perhaps taking a long time to get to the server. I thought the network trace might reveal a SYN, a long delay that exceeded the connection timeout and a close (RST) from the client. Instead, what I saw was the SYN (tcp open) followed quickly by the RST (tcp close.)
What was causing the SQL Server networking client to call the TCP open and then call TCP close without exceeding the connection timeout and without attempting the TDS login activities?
To make the scenario a bit more interesting:
With some help from Dylan and Brian I was able to break down a reproduction to a Powershell script using SQLClient to connect to SQL Server.
while (1 -eq 1)
$connectionString = ‘Data Source=MyServer,1433;database=master;User ID=sa;Password=xxxxxxxxx;Pooling=False’
$sqlConnection = New-Object System.Data.SqlClient.SqlConnection $connectionString
Attaching the debugger to the powershell repro I narrowed the issue to a close being called during the open.
What I discovered was the SQLClient attempting to connect to the server using transparent networking Ip resolution. The servers exhibiting the behavior had multiple IP addresses registered with DNS and the servers that didn’t exhibit the behavior had a single IP address registration. The SQLClient attempts to connect to all IP addresses returned from getaddrinfo, in parallel, if the target IP address registration count is between 2 and 64. The SQLClient starts asynchronous, tcp open (SYN) requests to the listed IP addresses. The first connection to accept the SYN ACK from the server (accept the open request) wins and the other, parallel, open requests are closed (RST.) This explains why I always saw a burst of open requests (SYN) to the server but only one of them succeeded and others were closed (RST.)
The SQLClient provides a connection property to control the transparent network ip resolution behavior.
Property = TransparentNetworkIPResolution : https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.connectionstring?vie...
I wrote this blog to help you troubleshoot additional 10054 error patterns. By pointing out the behavior of the SYN, RST occurring from the client, in a back-to-back fashion, you are able to filter these (treat as noise) and focus on other issues that might be causing 10054 errors on your system.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.