We recently have received a problem report about high CPU usage on an Always ON environment. This is happening when the Always ON and Log Shipping features are used together.
Symptom
It is possible to reproduce this scenario with 10 Availability Groups and 5 databases per AG.
|
|
For all 50 databases SQL Server Log Shipping is configured. Idea is to have a separate copy of these databases which follow the production with a certain delay (this is an intended delay due to business needs)
Observed behavior is whenever Log Shipping backup jobs are triggered, the CPU is peaking on Log Shipping Principal Server. (If you keep the default behavior and they all triggered at the same time)
Perfmon showing the peak on % Processor Time.
Task Manager showing the high CPU usage of cluster.exe , sqlserver.exe (other contributors are sqlagent.exe and logship.exe (multiple instances ) )
To understand this further WPR was captured from this environment. Investigations showed this usage is mainly in Kernel Time and caused by a Cluster API call made by SQL Server.
sqlmin.dll!HadrWsfcUtil::OpenClusterResourceKey
sqlmin.dll!HadrAvailabilityGroupStore::Retrieve
sqlmin.dll!HadrAvailabilityGroup::Refresh
sqlmin.dll!HadrAvailabilityGroup::HadrAvailabilityGroup
sqlmin.dll!HadrAvailabilityGroupObjectModel::GetAvailabilityGroupByResourceId
sqlmin.dll!HadrAvailabilityGroupObjectModel::GetAvailabilityGroupByName
sqlmin.dll!DmHadrInternalAgStatesTable::GetAgConfiguration
sqlmin.dll!DmHadrInternalAgStatesTable::InternalGetRow
sqlmin.dll!CQScanTVFStreamNew::GetRow
sqlmin.dll!CQScanLightProfileNew::GetRow
sqlmin.dll!CQScanFilterNew::GetRowHelper
sqlmin.dll!CQScanLightProfileNew::GetRow
sqlmin.dll!CQueryScan::GetRow
sqllang.dll!CXStmtQuery::ErsqExecuteQuery
sqllang.dll!CXStmtSelect::XretExecute
sqllang.dll!CMsqlExecContext::ExecuteStmts<1,1>
sqllang.dll!CMsqlExecContext::FExecute
sqllang.dll!CSQLSource::Execute
sqllang.dll!CSQLObject::ExecuteFunction
sqllang.dll!CUdfExecInfo::InvokeTSQLScalarUDF
sqllang.dll!UDFInvokeImpl
sqltses.dll!CEsExec::GeneralEval4
sqlmin.dll!CQScanProjectNew::EvalExprs
sqlmin.dll!CQScanProjectNew::GetRow
sqlmin.dll!CQScanLightProfileNew::GetRow
sqlmin.dll!CQueryScan::GetRow
sqllang.dll!CXStmtQuery::ErsqExecuteQuery
sqllang.dll!CXStmtSelect::XretExecute
sqllang.dll!CMsqlExecContext::ExecuteStmts<1,1>
sqllang.dll!CMsqlExecContext::FExecute
sqllang.dll!CSQLSource::Execute
sqllang.dll!ExecuteSql
sqllang.dll!CSpecProc::ExecuteSpecial
sqllang.dll!CSpecProc::Execute
sqllang.dll!process_request
sqllang.dll!process_commands_internal
sqllang.dll!process_messages
sqldk.dll!SOS_Task::Param::Execute
sqldk.dll!SOS_Scheduler::RunTask
sqldk.dll!SOS_Scheduler::ProcessTasks
sqldk.dll!SchedulerManager::WorkerEntryPoint
sqldk.dll!SystemThreadDispatcher::ProcessWorker
sqldk.dll!SchedulerManager::ThreadEntryPoint
kernel32.dll!BaseThreadInitThunk
ntdll.dll!RtlUserThreadStart
[Root]
This is because log shipping is calling system function sys.fn_hadr_backup_is_preferred_replica for those backups so it can gather AG related backup preferences and act accordingly.
This system function is using certain system views which are built on Cluster API calls in the background to fetch this data (AG role ownership) from Cluster database.
Cause
Concurrent and repetitive calls to this API can cause this high CPU usage.
Resolution
This is due to live nature of those parameters and mitigation is to be differentiating the backup times to minimize that concurrency or distribute AG Primaries to multiple replicas.
Updated May 13, 2025
Version 1.0haci
Microsoft
Joined August 19, 2020
SQL Server Support Blog
Follow this blog board to get notified when there's new activity