From Customer
Interesting point here is, current_workers_count is ~10X compared to active_workers_count; means no tasks assigned to 10X threads.
How did threads land with no tasks? Is there orphaned threads not getting cleaned up?
My Notes
- The total workers are divided evenly among the visible schedulers (schedulers that accept incoming query requests.) For example, a max worker setting of 1000 on a 4-scheduler system is 250 per scheduler.
- Each scheduler maintains a minimum level of workers, so workers don’t have to be created when new work arrives. You can think of them as hot workers.
- Depending on the installation workers greater than minimum are trimmed at different intervals. See Trimming Behavior below.
- Tasks can be transactional or permanent. Requests to the server are transactional. Acquire a task, bind to worker, perform the query request, unbind the worker, and release the task. Requests which are permanent are things like checkpoint, lazy writer, etc. They are bound to worker for the lifetime of the instance.
- Hidden schedulers are used to handle activities such as backup, create and maintain workers.
- The task count includes the transactional and permanent task requests.
- The runnable_tasks_count are those tasks no longer waiting on a resource and available to be scheduled. Scheduling occurs when the current running worker performs a context switch (yields.)
In the output above I see: (sys.dm_os_schedulers (Transact-SQL) - SQL Server | Microsoft Docs)
- current_tasks_count (this includes transactional and permanent tasks)
- runnable_tasks_count (value is low so likely blocking taking place, preemptive workers, waiting dispatchers, or system timer tasks that are not runnable at time of the capture)
- current_workers_count (number of workers created to handle tasks)
- active_workers_count (workers that are bound to tasks in non-preeemptive mode. The difference in current_tasks_count and active_workers_count is likely the preemptive workers)
Here is my takeaway from just this data point.
- You have some preemptive activity to handle things like network completion ports, handling network IO for worker, XEvent outputs, etc. Looking at the sys.dm_exec_requests and joining to sys.dm_os_tasks, sys.dm_os_workers and sys.dm_os_schedulers provides a better picture of the ongoing preemptive work.
- The same data from dm_exec_requests and SQL OS DMVs helps point out the permanent tasks (Checkpoint, LW, Version Store Cleanup, Ghost Cleanup, …)
- There are dispatcher tasks that are maintained by the various dispatchers and are returned (called fading) based on the dispatcher fading timeouts. The dispatcher times out, unbinds the task from the worker and the worker becomes idle. Now the configured SOS idle trimming is applied.
- Some features, such as HADR, maintain their own worker pool on top SOS (much like a dispatcher pool.) These queues and associated tasks have separate idle timeouts. The same DMV queries can point to such tasks. For example, a task could be bound to a worker and assigned to the HADR worker pool for handling HADR requests. If there is no HADR work to perform, then the task is waiting on the HADR queue event. The task is not runnable and the idle timeout for the HADR queue is being applied.
- The dm_os_waits, dm_os_waiting_tasks and such DMVs are helpful in understanding the wait behaviors.
Failed Logins
The reason the above data was captured was in response to failed loging. When I add failed logins as a outcome can fill in a bit more information.
Yes, logins can be impacted when reaching max worker threads. The IO completion port accepts the new connection request and enqueues a task to process the login. This login task is assigned to a visible scheduler. If a worker cannot be bound and process the task before the login timeout is exceeded the login fails. You can reproduce this with a simple test, say max workers is set to 1000. Make 1000 connections (ostress -E -SMyServer -Q”wait for delay ’10:00:00’” -n1000“) which will hold the worker (remember a task is bound to a worker until it completes the task request.) You can reproduce the same outcome with a long blocking chain, etc. Now new connection requests timeout.
There are some additional issues that can lead to login timeout. For example, to combat Denial Of Service (DOS) the SSL responses are often limited to 5 or 10 seconds. If these timeouts occur then the login fails but the task and worker binding was in effect for the login attempt and secondary login attempts would be creating and using additional workers.
- Using the DMVs to understand what the workers are associated with is a great place to start.
- The connectivity ring buffer (sys.dm_os_ring_buffers) is another excellent source of information. The error state and the additional login timing information shows where the time was spent attempting to process the login.
Once we can see the worker assignments and ring buffer information the pattern is commonly easy to see, troubleshoot and resolve.
nocount on
go
use tempdb
go
drop table if exists captureTables
go
create table captureTables(target sysname NOT NULL, source sysname NOT NULL)
go
insert into captureTables values
('cap_dm_os_sys_info', 'sys.dm_os_sys_info'),
('cap_dm_os_schedulers', 'sys.dm_os_schedulers'),
('cap_dm_os_threads', 'sys.dm_os_threads'),
('cap_dm_os_tasks', 'sys.dm_os_tasks'),
('cap_dm_os_workers', 'sys.dm_os_workers'),
('cap_dm_os_dispatchers', 'sys.dm_os_dispatchers'),
('cap_dm_os_dispatcher_pools', 'sys.dm_os_dispatcher_pools'),
('cap_dm_exec_requests', 'sys.dm_exec_requests'),
('cap_dm_exec_sessions', 'sys.dm_exec_sessions')
go
------------------------------------------------
-- Cleanup previous execution
------------------------------------------------
drop procedure if exists spCleanup
go
create procedure spCleanup
as
begin
set nocount on
declare target sysname
select \* into #captureTables from captureTables
select TOP 1 target = target from #captureTables
while (@target is not null)
begin
exec('drop table if exists ' + target)
delete from #captureTables where target = target
set target = null
select TOP 1 target = target from #captureTables
end
end
go
------------------------------------------------
-- Capture data
------------------------------------------------
drop procedure if exists spCapture
go
create procedure spCapture
as
begin
set nocount on
declare @captureTime sysname= convert(sysname, GetUTCDate(), 109)
declare target sysname
declare @source sysname
-- Avoid cursor overhead
--
select \* into #captureTables from captureTables
select TOP 1 target = target, @source = source from #captureTables
while (@target is not null)
begin
print 'Processing ' + target
if (object\_id(@target) is null)
begin
exec('select cast(''' + @captureTime + ''' as datetime) as \[CaptureTime\], \* into ' + target + ' from ' + @source)
end
else
begin
exec('insert into ' + target + ' select cast(''' + @captureTime + ''' as datetime) as \[CaptureTime\], \* from ' + @source)
end
-- exec('select \* from ' + target)
delete from #captureTables where target = target
set target = null
select TOP 1 target = target, @source = source from #captureTables
end
end
go
exec spCleanup
go
exec spCapture
waitfor delay '00:00:05'
exec spCapture
go
------------------------------------------------
-- Report outs
------------------------------------------------
print 'System Info'
go
select
CaptureTime,
cpu_ticks - LAG(cpu_ticks,1) OVER (ORDER BY CaptureTime) as cpu_ticks_delta,
ms_ticks - LAG(ms_ticks,1) OVER (ORDER BY CaptureTime) as ms_ticks_delta,
committed_kb - LAG(committed_kb,1) OVER (ORDER BY CaptureTime) as committed_kb_delta,
max_workers_count - LAG(max_workers_count,1) OVER (ORDER BY CaptureTime) as max_workers_count_delta,
process_kernel_time_ms - LAG(process_kernel_time_ms,1) OVER (ORDER BY CaptureTime) as process_kernel_time_ms_delta,
process_user_time_ms - LAG(process_user_time_ms,1) OVER (ORDER BY CaptureTime) as process_user_ms_delta
, *
from cap_dm_os_sys_info
go
print 'Scheduler Info'
go
select
CaptureTime,
scheduler_address,
context_switches_count - LAG(context_switches_count,1) OVER (PARTITION BY scheduler_address ORDER BY CaptureTime) as context_switch_delta,
preemptive_switches_count - LAG(preemptive_switches_count,1) OVER (PARTITION BY scheduler_address ORDER BY CaptureTime) as preemptive_switch_delta,
current_tasks_count - LAG(current_tasks_count,1) OVER (PARTITION BY scheduler_address ORDER BY CaptureTime) as current_tasks_count_delta,
runnable_tasks_count - LAG(runnable_tasks_count,1) OVER (PARTITION BY scheduler_address ORDER BY CaptureTime) as runnable_tasks_count_delta,
yield_count - LAG(yield_count,1) OVER (PARTITION BY scheduler_address ORDER BY CaptureTime) as yield_count_delta,
total_cpu_usage_ms - LAG(total_cpu_usage_ms,1) OVER (PARTITION BY scheduler_address ORDER BY CaptureTime) as total_cpu_usage_ms_delta,
*
from cap_dm_os_schedulers
order by 8 desc
go
print 'Thread Info'
go
select
CaptureTime,
os_thread_id,
kernel_time - LAG(kernel_time,1) OVER (PARTITION BY os_thread_id ORDER BY CaptureTime) as kernel_time_delta,
usermode_time - LAG(usermode_time,1) OVER (PARTITION BY os_thread_id ORDER BY CaptureTime) as user_time_delta,
*
from cap_dm_os_threads
order by 4 desc
go
print 'Task Info'
go
select
CaptureTime,
task_address,
context_switches_count - LAG(context_switches_count,1) OVER (PARTITION BY task_address ORDER BY CaptureTime) as context_switch_delta,
*
from cap_dm_os_tasks
order by 3 desc
go
print 'Worker Info'
go
select
CaptureTime,
worker_address,
context_switch_count - LAG(context_switch_count,1) OVER (PARTITION BY worker_address ORDER BY CaptureTime) as context_switch_delta,
quantum_used - LAG(quantum_used,1) OVER (PARTITION BY worker_address ORDER BY CaptureTime) as quantum_used_delta,
*
from cap_dm_os_workers
order by 4 desc
go
print 'Requests Info'
go
select
CaptureTime,
session_id,
request_id,
reads - LAG(reads,1) OVER (PARTITION BY session_id, request_id ORDER BY CaptureTime) as reads_delta,
logical_reads - LAG(logical_reads,1) OVER (PARTITION BY session_id, request_id ORDER BY CaptureTime) as logical_reads_delta,
writes - LAG(writes,1) OVER (PARTITION BY session_id, request_id ORDER BY CaptureTime) as writes_delta,
cpu_time - LAG(cpu_time,1) OVER (PARTITION BY session_id, request_id ORDER BY CaptureTime) as cpu_time_delta,
*
from cap_dm_exec_requests
order by 7 desc
go
print 'Sessions Info'
go
select
CaptureTime,
session_id,
cpu_time - LAG(cpu_time,1) OVER (PARTITION BY session_id ORDER BY CaptureTime) as cpu_time_delta,
total_scheduled_time - LAG(total_scheduled_time,1) OVER (PARTITION BY session_id ORDER BY CaptureTime) as total_scheduled_time_delta,
reads - LAG(reads,1) OVER (PARTITION BY session_id ORDER BY CaptureTime) as reads_delta,
logical_reads - LAG(logical_reads,1) OVER (PARTITION BY session_id ORDER BY CaptureTime) as logical_reads_delta,
writes - LAG(writes,1) OVER (PARTITION BY session_id ORDER BY CaptureTime) as writes_delta,
*
from cap_dm_exec_sessions
order by 4 desc
go
print 'Dispatcher Info'
go
select
d.CaptureTime,
dispatcher_pool_address,
d.task_address,
items_processed - LAG(items_processed,1) OVER (PARTITION BY dispatcher_pool_address, d.task_address ORDER BY d.CaptureTime) as items_processed_delta,
w.quantum_used - LAG(w.quantum_used,1) OVER (PARTITION BY dispatcher_pool_address, d.task_address ORDER BY d.CaptureTime) as quantum_used_delta,
tt.kernel_time - LAG(tt.kernel_time,1) OVER (PARTITION BY dispatcher_pool_address, d.task_address ORDER BY d.CaptureTime) as kernel_time_delta,
tt.usermode_time - LAG(tt.usermode_time,1) OVER (PARTITION BY dispatcher_pool_address, d.task_address ORDER BY d.CaptureTime) as usermode_time_delta,
*
from cap_dm_os_dispatchers d
inner join cap_dm_os_tasks t on t.task_address = d.task_address and t.CaptureTime = d.CaptureTime
inner join cap_dm_os_workers w on w.task_address = t.task_address and w.CaptureTime = t.CaptureTime
inner join cap_dm_os_threads tt on tt.thread_address = w.thread_address and w.CaptureTime = tt.CaptureTime
order by 7 desc
go
print 'Dispatcher Pool Info'
go
select
CaptureTime,
dispatcher_pool_address,
dispatcher_count - LAG(dispatcher_count,1) OVER (PARTITION BY dispatcher_pool_address ORDER BY CaptureTime) as dispatcher_count_delta,
dispatcher_waiting_count - LAG(dispatcher_waiting_count,1) OVER (PARTITION BY dispatcher_pool_address ORDER BY CaptureTime) as dispatcher_waiting_count_delta,
queue_length - LAG(queue_length,1) OVER (PARTITION BY dispatcher_pool_address ORDER BY CaptureTime) as queue_length_delta,
*
from cap_dm_os_dispatcher_pools
order by 5 desc
go
print 'Mappings'
select
ss.*, r.*, w.*, ts.*, t.*, s.*
from cap_dm_os_schedulers s
inner join cap_dm_os_workers w on w.scheduler_address = s.scheduler_address and s.CaptureTime = w.CaptureTime
inner join cap_dm_os_threads t on t.thread_address = w.thread_address and t.CaptureTime = w.CaptureTime
inner join cap_dm_os_tasks ts on ts.task_address = w.task_address and ts.CaptureTime = w.CaptureTime
inner join cap_dm_exec_requests r on r.task_address = ts.task_address and r.CaptureTime = ts.CaptureTime
inner join cap_dm_exec_sessions ss on ss.session_id = r.session_id and ss.CaptureTime = r.CaptureTime
order by ss.session_id, r.request_id, ss.CaptureTime
go
Trimming Behavior
The default idle worker timeout is 1500 seconds (BOX, SQLVM.) However, various deployments may adjust the timeout threshold.
Sub-Core set to 120 seconds
Data Warehouse DMS set to 120 seconds
Data Warehouse Engine set to 120 seconds
SQLDB instance set to 900 sec
...