SQL Server Alwayson primary replica failed to create diagnostics log.
Published Mar 15 2020 07:29 PM 2,881 Views
Microsoft

SQL Server Alwayson diagnostics log is a useful tool to troubleshoot SQL Server Alwayson related issue. It saves the result of stored procedure system stored procedure sp_server_diagnostics (Transact-SQL)  to SQLDIAG file, the naming convention is serverName_instanceName_SQLDIAG_xxx.xel.

This log is only created in SQL Server cluster active node or SQL Server Alwayson Availability group primary replica. Please note, it’s created by RHS.exe instead of SQL Server.

Following screenshot of process monitor shows callstack of creating the diagnostic log by RHS.exe. As you see, RHS.exe, hadrres.dll and xe.dll are used.

 

processMonitor.png

When SQL Server is installed, it creates hadrres.dll in MSSQL\Binn folder and creates xe.dll in C:\Program Files\Microsoft SQL Server\version\Shared\ folder by default , it also registers the hadrres.dll in c:\windows\system32.

If more than one SQL Server versions are installed, the highest version of hadrres.dll is registered in in c:\windows\system32, it’s Backwards compatible.

For example, if you have following topology.

SQL Box1:
           Instance 1:SQL server 2012 (Primary replica)
           Instance 2: SQL server 2014 (this is a standalone instance, AG is not used)
  SQL Box2:
           Instance 1:SQL server 2012 (Secondary replica)

SQL 2014 version of hadrres.dll registered in c:\windows\system32 in box1.

SQL 2012 version of hadrres.dll registered in c:\windows\system32 in box2.

 

In box1, the RHS.exe will load the SQL 2014 version hadrres.dll, which will loads the SQL Server 2014 xe.dll, when the instance 1 becomes primary replica.

In box2, the RHS.exe will load the SQL 2012 version hadrres.dll, which will loads the SQL Server 2012 xe.dll, when the instance 1 becomes primary replica.

The location of xe.dll is saved in register HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\version\SharedCode

 
 

sharedCode.png

 

Hadrres.dll retrieves(hardcoded) the location of xe.dll by reading the register HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\version\SharedCode, then load the xe.dll.

SQL 2019 : HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\150\SharedCode

SQL 2017 : HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\140\SharedCode

SQL 2016 : HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\130\SharedCode

SQL 2014 : HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\120\SharedCode

SQL 2012 : HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\110\SharedCode

After xe.dll is loaded, it will create the diagnostic log based on the configuration(check sys.dm_os_server_diagnostics_log_configurations)

Here are some scenarios cause the diagnostic fail to be created:

1.Diagnostic log is off.

ALTER SERVER CONFIGURATION SET DIAGNOSTICS LOG off

2.HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\version\SharedCode does not exist or have invalid value.

3.The file xe.dll retrieved from the register of step 2 does not exist.

4.Version of xe.dll does not match the version of hadrres.dll(it may cause issue or may not, depends on SQL version).

5.RHS.exe does not have permission write file in the target location(check the DMV sys.dm_os_server_diagnostics_log_configurations).

‘Diagnostics log’ related logs:

If the diagnostic log is disabled, you will get below info from cluster log.

INFO [RES] SQL Server Availability Group: [hadrag] Extended Event logging is disabled

If the path changes, you will get below info from cluster log.

INFO [RES] SQL Server Availability Group: [hadrag] The property LogPath was changed from ‘???’ to ‘????’

Once the log file is created, you will get below info from cluster log.

INFO [RES] SQL Server Availability Group: [hadrag] Extended Event logging is started

If the log fails to created, you will get below info from cluster log.

ERR   [RES] SQL Server Availability Group: [hadrag] Failed to initialize Extended Event logging engine with error [ErrorCode]

Here is the real case :

Customer had a SQL 2012 Alwayson Availability group consisted of with three replicas. One of the replicas failed to create diagnostic log when it became primary replica. Other two replicas were able to create diagnostic logs.

I launched process monitor against the RHS.exe in the replica in question. I found that the xe.dll was not loaded, and the hadrres.dll loaded was SQL Server 2014 version instead of SQL Server 2012.

Discussed with customer, I found that there was SQL Server 2014 was installed, then it’s removed. That’s the cause!

I’m able to reproduce the issue in my box.

SQL Box1:
           Instance 1:SQL server 2012 (Primary replica)
           Instance 2: SQL server 2014 (this is a standalone instance, AG is not used)
  SQL Box2:
           Instance 1:SQL server 2012 (Secondary replica)

Remove the instance 2, then instance 1 on box1 can’t create diagnostic log.

The issue happens because uninstalling SQL Server instance removes the xe.dll in C:\Program Files\Microsoft SQL Server\120\Shared, but still leaves the SQL 2014 hadrres.dll in c:\windows\system32.

This behavior causes hadrres.dll fail to load xe.dll because xe.dll is removed during uninstallation.

Workaround

===

Copy the same version SQL 2014 xe.dll back to C:\Program Files\Microsoft SQL Server\version\Shared.

Version history
Last update:
‎May 05 2020 10:04 AM
Updated by: