The Tao of a Deadlock Scheduler in SQL Server

Microsoft

Feb 11, 2019

First published on MSDN on Dec 14, 2012

Just thought I would summarize the concept of a SOS Scheduler and a Deadlocked Scheduler since many have inquired over the years for a brief summary of these concepts

UMS/SOS Scheduler in SQL Server

Since SQL Server 7.0, SQL Server has used its own scheduling mechanism, called UMS (User-mode scheduler) in 7.0 and 2000 and later renamed to SOS (SQL on OS scheduler). UMS/SOS is a cooperative (non-preemptive) scheduler which means that it relies on threads to voluntarily give up CPU usage - yield - to the next thread waiting in line. That means that there are locations in the SQL Server code where the Microsoft developer built in yield points, causing execution to “pause” and gracefully let another citizen in the SQL kingdom to exercise its right to execute. Of course, once a thread "leaves" SQL Server SOS kingdom, that thread is still handed off to the Windows preemptive scheduler (ruled by another Master so to speak). The goal of SOS is for SQL Server to expose only one thread at a time per CPU and thus minimize competition among SQL threads exposed to the OS. The ultimate goal is to reduce the very expensive kernel-mode context switching from one thread to another.

Since SQL Server uses a cooperative scheduler, it relies on the good heart of each developer who writes code in SQL Server to call a Yield() function of some kind that prevents the thread from monopolizing the CPU. But even the best-intentioned developer could make mistakes (and introduce a bug in his code), or a SQL thread could be at the mercy of some external component – like calling into code outside of SQL Server. Because of that, the SOS scheduler has a dedicated thread -Scheduler Monitor - that periodically checks the state of each scheduler and reports any "irregularities". Examples of irregularities include a thread not yielding voluntarily (a non-yielding scheduler) or all schedulers are "stuck" not processing any requests (deadlocked schedulers). Typically this Scheduler Monitor thread will report problems every 60 seconds, though it monitors the IOCP thread that accepts connections every 15 seconds .

An entire white paper could be written about how SOS scheduler works but for this discussion I will limit the information to some summary points. Actually, a white paper has been written already - SOS Scheduler white paper

What is a Deadlocked Scheduler?

A deadlocked scheduler scenario is reported when ALL SOS Schedulers are reported to be "stuck". Technically, this means that since the last Scheduler Monitor check - say about 60 seconds earlier - nothing has changed and no work is being processed. Here are some of things checked:

Are there any tasks waiting to be processed? (check if sys.dm_os_schedulers.work_queue_count > 0)

Have any new worker threads been created since last check?

Has any work been processed since last check (i.e. have any new tasks been assigned to workers)?

If any of these conditions is met (not all), the Scheduler Monitor declares that schedulers are stuck - deadlocked.

A note on NUMA: SQL Server can report Deadlocked Schedulers if all the SOS schedulers on a single NUMA node are stuck, even if other NUMA nodes are processing tasks just fine.

When the Scheduler monitor detectes a "deadlock schedulers" condition, it reports an error in the Errorlog and triggers a memory dump to be generated by the SQLDumper.exe against SQL Server process.

What "Deadlocked Schedulers" is not?

The phraase "deadlocked schedulers" has confused many and rightly so. The term does NOT imply a classic Lock-Manager deadlock where multiple sessions are trying to access locks in the opposite order and permanently block each other. The term applies to SOS Schedulers and the fact that none of them are processing queries, logins, etc. So if you are thinking of "traditional" lock deadlocks, you may steer away from such thoughts.

Most Common Causes of Deadlocked Schedulers:

Even though many conditions can cause all schedulers to be stuck at the same time, in 95% of the time that I have looked at memory dumps, the reason has been that the majority of worker threads were tied up waiting for a lock resource. Yes, the good ol' blocking chains with a head blocker that you are so familiar with, is the most common example. This causes the number of available SQL worker threads to be exhausted. The worker threads themselves are tied up because they are waiting for a lock and when new tasks (work) is submitted to the SQL Server, there are no worker threads available to pick the tasks up and execute them. Of courses, Locks are not the only resource you see contention on. Here is the list of other types of resources that have caused long "blocking" (I use the term loosely here) chains.

Lock(s) - a very long blocking chain

Latch(es)

I/O latches - which means stuck I/O
Heavy Latch chain
Orphaned/unreleased page or non-page latches (commonly due to a previous AV exception or assert) - not common

Spinlock(s)

Orphaned/unreleased spinlock (due to a previous AV, assert or other exception)

Parallel queries

Huge number of child threads spawned causing exhaustion of worker threads on the system to service one or two queries

How to Troubleshoot Deadlocked Schedulers

Some deadlock scheduler scenarios will resolve by themselves - as in the case of parallel queries, long-held I/O latch or, rarely, blocking. Orphaned structures like spinlocks or latches would require that you restart SQL Server. In most realistic cases, you will be aware of the issue only after it has been going on for awhile or after it has resolved itself. Therefore, in those cases, you will be focusing on post-mortem analysis of the problem. However, in rare cases, you may be able to take action while the problem is happening and resolve it (see using DAC connection below).

There are two ways to find the root cause of deadlocked schedulers:

1.Using DAC connection and examining some DMVs

2.Examining a memory dump

1. Troubleshooting Deadlocked Schedulers using the DAC connection

In essence, the sole purpose of the Dedicated Admin Connection (DAC) is to troubleshoot server problems like deadlocked schedulers. That is, if you catch it "live", while it is happening. You can find details on how to connect to SQL Server using DAC here . The most effective use of DAC to investigate and resolve deadlock scheduler problems is if the issue is caused by blocking. Luckily, this is the most common reason for deadlocked schedulers so DAC can be an effective tool.

So, once you connect with DAC (I would suggest using sqlcmd -A instead of SSMS, since the latter may attempt multiple connections to SQL Server) , you can investigate a couple of things:

1. The state of all schedulers - sys.dm_os_schedulers and specifically if there is any queued work that is waiting.

2. If there are hundreds of blocked threads - sys.dm_exec_requests - Wait_type = LCK_* (any type of lock wait)

3. Identify the head blocker from sys.dm_exec_requests by manually tracing which session is blocking all others, and terminate it by using the KILL command.

Note : If the issue is caused by an orphaned latch or orphaned spinlock, then the only way to get out of this situation is to restart SQL Server.

2. Troubleshooting Deadlocked Schedulers by Examining a Memory Dump

Most commonly the best way to find the root cause of deadlocked schedulers is to figure out what the majority of worker threads are stuck behind. And the way to do this by examining the memory dumps that SQL Server automatically generates for this purpose.

Here are some examples of stacks that you will observe if you were to examine a memory dump (details on how to do so using public symbols can be found here )

Blocking:

A typical call stack of all the threads that are stuck on Lock waits will include the LockOwner::Sleep() function. Here is a example

ntdll!KiFastSystemCallRet
ntdll!ZwSignalAndWaitForSingleObject+0xc
kernel32!SignalObjectAndWait+0xaf
sqlservr!SOS_Scheduler::Switch+0x81
sqlservr!SOS_Scheduler::SwitchContext+0x2fd
sqlservr!SOS_Scheduler::SuspendNonPreemptive+0xb9
sqlservr!SOS_Scheduler::Suspend+0x2e
sqlservr!SOS_Event::Wait+0x12a
sqlservr! LockOwner::Sleep +0x13e
sqlservr!lck_lockInternal+0x990
sqlservr!GetLock+0x178
sqlservr!BTreeRow::AcquireLock+0x14c
sqlservr!IndexRowScanner::AcquireNextRowLock+0x1da
sqlservr!IndexDataSetSession::GetNextRowValuesInternal+0x53a
sqlservr!IndexDataSetSession::GetNextRowValues+0x2d
sqlservr!RowsetNewSS::FetchNextRow+0x3e
sqlservr!CQScanRowsetNew::GetRowWithPrefetch+0x2b
sqlservr!CQScanRangeNew::GetRow+0x141
sqlservr!CQScanTopNew::GetRow+0x108
sqlservr!CQScanUpdateNew::GetRow+0x76
sqlservr!CQueryScan::GetRow+0x5a
sqlservr!CXStmtQuery::InitForExecute+0x51
sqlservr!CXStmtQuery::ErsqExecuteQuery+0x291
sqlservr!CXStmtDML::XretDMLExecute+0x284
sqlservr!CXStmtDML::XretExecute+0x8b
sqlservr!CMsqlExecContext::ExecuteStmts<0,1>+0x7cf
sqlservr!CMsqlExecContext::FExecute+0x550
sqlservr!CSQLSource::Execute+0x73b
sqlservr!CStmtPrepQuery::XretExecute+0x45b
sqlservr!CExecuteStatement::XretExecute+0x1b
sqlservr!CMsqlExecContext::ExecuteStmts<1,1>+0x360
sqlservr!CMsqlExecContext::FExecute+0x55b
sqlservr!CSQLSource::Execute+0x73b
sqlservr!process_request+0x2e3
sqlservr!process_commands+0x2e0
sqlservr!SOS_Task::Param::Execute+0xe2
sqlservr!SOS_Scheduler::RunTask+0xb9
sqlservr!SOS_Scheduler::ProcessTasks+0x141
sqlservr!SchedulerManager::WorkerEntryPoint+0x1a8
sqlservr!SystemThread::RunWorker+0x7f
sqlservr!SystemThreadDispatcher::ProcessWorker+0x225
sqlservr!SchedulerManager::ThreadEntryPoint+0x143
msvcr80!_callthreadstartex+0x1b
msvcr80!_threadstartex+0x66
kernel32!BaseThreadStart+0x34

Orphaned/unreleased Latch:

You will observe that most thread that are stuck behind a latch will contain the LatchBase::Suspend() call. Here is an example

ntdll!ZwWaitForSingleObject+0xa
kernel32!WaitForSingleObjectEx+0x130
sqlservr!SOS_Scheduler::SwitchContext+0x623
sqlservr!SOS_Scheduler::SuspendNonPreemptive+0xc5
sqlservr!EventInternal<Spinlock<149,1,0> >::Wait+0x428
sqlservr! LatchBase::Suspend +0x42c
sqlservr!LatchBase::AcquireInternal+0xfa
sqlservr!CTraceController::FAcquireLock+0x53
sqlservr!CTraceController::CAutoLock::FAcquire+0x18
sqlservr!CTraceController::ProduceRecord+0xb58456
sqlservr!CRpcTraceHelper::TracePostRPC+0x347
sqlservr!CRpcTraceHelper::TracePostExec+0xfc8b44
sqlservr!CRPCExecEnv::OnExecFinish+0xaf
sqlservr!process_request+0x399
sqlservr!process_commands+0x1ba
sqlservr!SOS_Task::Param::Execute+0x11b
sqlservr!SOS_Scheduler::RunTask+0xca
sqlservr!SOS_Scheduler::ProcessTasks+0x95
sqlservr!SchedulerManager::WorkerEntryPoint+0x110
sqlservr!SystemThread::RunWorker+0x60
sqlservr!SystemThreadDispatcher::ProcessWorker+0x12c
sqlservr!SchedulerManager::ThreadEntryPoint+0x12f
msvcr80!_callthreadstartex+0x17
msvcr80!_threadstartex+0x84
kernel32!BaseThreadStart+0x3a

Orphaned/unreleased Spinlock:

Here is what a stack might look like from a thread stuck behind a spinlock - note the SpinlockBase::Sleep()

ntdll!ZwDelayExecution+0xa
kernel32!SleepEx+0xaf
sqlservr! SpinlockBase::Sleep +0x15d5a1
sqlservr!SpinlockBase::SpinToAcquire+0x11b
sqlservr!TSyncHashTable_EntryAccessor<...>::TSyncHashTable_EntryAccessor<...>+0x76
sqlservr!CQSIndexStatsMgr::UsedFor+0x68
sqlservr!CQueryScan::Uncache+0x4788a6
sqlservr!CXStmtQuery::SetupQueryScanAndExpression+0x252
sqlservr!CXStmtQuery::ErsqExecuteQuery+0x232
sqlservr!CXStmtSelect::XretExecute+0x110
sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn+0x13
sqlservr!CMsqlExecContext::ExecuteStmts<1,0>+0x9e5
sqlservr!CMsqlExecContext::FExecute+0x4c5270
sqlservr!CSQLSource::Execute+0x36d
sqlservr!ExecuteSql+0x700
sqlservr!CSpecProc::ExecuteSpecial+0x4c7
sqlservr!CSpecProc::Execute+0x1d5
sqlservr!process_request+0x312
sqlservr!process_commands+0x1c4
sqlservr!SOS_Task::Param::Execute+0xee
sqlservr!SOS_Scheduler::RunTask+0xc9
sqlservr!SOS_Scheduler::ProcessTasks+0xb4
sqlservr!SchedulerManager::WorkerEntryPoint+0xe7
sqlservr!SystemThread::RunWorker+0x59
sqlservr!SystemThreadDispatcher::ProcessWorker+0x130
sqlservr!SchedulerManager::ThreadEntryPoint+0x128
msvcr80!_callthreadstartex+0x17
msvcr80!_threadstartex+0x84
kernel32!BaseThreadStart+0x3a