We get our fair share of cases related to SQL Server running ( or not running) on a Windows Cluster. I had one of them recently where the customer was seeing different authentication modes for SQL Server depending on which node of the 2-node cluster that it was online on. The Errorlogs document this behavior clearly as follows
We see that Sql came online on node P1 it was in Mixed Mode
2010-05-08 05:17:41.13 Server Authentication mode is
2010-05-08 05:17:41.70 spid4s The NETBIOS name of the local node that is running the server is 'P1'.
2010-05-08 23:26:05.69 spid4s SQL Trace was stopped due to server shutdown. Trace ID = '1'.
Afer a failover Sql came online on node P2 in Windows Authentication mode
2010-05-08 23:26:52.82 Server Authentication mode is
2010-05-08 23:26:53.60 spid5s The NETBIOS name of the local node that is running the server is 'P2'.
2010-05-09 10:16:18.55 spid5s SQL Trace was stopped due to server shutdown. Trace ID = '1'
Finally when they fail back to P1, SQL came online in Mixed Mode
2010-05-09 10:16:34.50 Server Authentication mode is
2010-05-09 10:16:45.42 spid4s The NETBIOS name of the local node that is running the server is 'P1’
The only recent change to their SQL Server environment was that they applied Service Pack 3 a couple of weeks back but they had not tested failover till this weekend – which is when they ran into this issue.
The behavior clearly indicates that there is something amiss between the two nodes of the cluster. I had just started to compare the registry keys between the nodes, when my colleague Adam Saxton suggested that this might be a problem with checkpoint not getting applied.
After some additional research I found that Service Pack 3 relies on the checkpoint to be applied to the passive node on the first failover- hence making it a non -issue in most cases. In this case I needed to first determine whether the checkpoint file was even there to be applied to the passive node or not.
Ok so let us see how to determine where the checkpoint files for SQL Server instance are located in a cluster. We start by first finding the GUID under HKEY_LOCAL_MACHINE\Cluster\Resources that corresponds to our SQL Server instance i.e. where the name=SQL Server
Now expand the GUID folder and click on the RegSync key under this GUID and you should see seven keys in it from 00000001 thru 00000007 as shown below.
If you are curious as to where these files actually reside, they are under the Quorum drive in the folder Q:\MSCS\<SQL Server GUID>
In this customer’s case we only had one key :- 00000007. Since the customer was clearly missing the checkpoint files, we had to recreate them by following these steps:-
a. Bring SQL Server online on working node with Mixed mode authentication
b. Take SQL Server resource group offline from within Cluster Administrator
c. Open regedit and add the following keys under HKEY_LOCAL_MACHINE\Cluster\Resources\<SQL Server GUID>\RegSync ( via Right click à New String Value in the right hand pane)
Value name Value Data
00000001 Software\Microsoft\Microsoft SQL Server\MSSQL.1\Replication
00000002 Software\Microsoft\Microsoft SQL Server\MSSQL.1\SQLserverAgent
00000003 Software\Microsoft\Microsoft SQL Server\MSSQL.1\Cluster
00000004 Software\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLSERVER
00000005 Software\Microsoft\Microsoft SQL Server\MSSQL.1\PROVIDERS
00000006 Software\Microsoft\Microsoft SQL Server\MSSQL.1\SQLServerSCP
d. Bring SQL Server resource group online from within Cluster Administrator
At this time you should see some new .CPT files created under Q:\MSCS\<GUID>\ indicating that the checkpointing is now working as expected. The next step is to attempt failover to the passive node, during which the checkpoint file shall get applied there and then SQL Server shall come up with Mixed Mode authentication on it as well.