From Customer
Interesting point here is, current_workers_count is ~10X compared to active_workers_count; means no tasks assigned to 10X threads.
How did threads land with no tasks? Is there orphaned threads not getting cleaned up?
My Notes
In the output above I see: (sys.dm_os_schedulers (Transact-SQL) - SQL Server | Microsoft Docs)
Here is my takeaway from just this data point.
Failed Logins
The reason the above data was captured was in response to failed loging. When I add failed logins as a outcome can fill in a bit more information.
Yes, logins can be impacted when reaching max worker threads. The IO completion port accepts the new connection request and enqueues a task to process the login. This login task is assigned to a visible scheduler. If a worker cannot be bound and process the task before the login timeout is exceeded the login fails. You can reproduce this with a simple test, say max workers is set to 1000. Make 1000 connections (ostress -E -SMyServer -Q”wait for delay ’10:00:00’” -n1000“) which will hold the worker (remember a task is bound to a worker until it completes the task request.) You can reproduce the same outcome with a long blocking chain, etc. Now new connection requests timeout.
There are some additional issues that can lead to login timeout. For example, to combat Denial Of Service (DOS) the SSL responses are often limited to 5 or 10 seconds. If these timeouts occur then the login fails but the task and worker binding was in effect for the login attempt and secondary login attempts would be creating and using additional workers.
Once we can see the worker assignments and ring buffer information the pattern is commonly easy to see, troubleshoot and resolve.
nocount on
go
use tempdb
go
drop table if exists captureTables
go
create table captureTables(target sysname NOT NULL, source sysname NOT NULL)
go
insert into captureTables values
('cap_dm_os_sys_info', 'sys.dm_os_sys_info'),
('cap_dm_os_schedulers', 'sys.dm_os_schedulers'),
('cap_dm_os_threads', 'sys.dm_os_threads'),
('cap_dm_os_tasks', 'sys.dm_os_tasks'),
('cap_dm_os_workers', 'sys.dm_os_workers'),
('cap_dm_os_dispatchers', 'sys.dm_os_dispatchers'),
('cap_dm_os_dispatcher_pools', 'sys.dm_os_dispatcher_pools'),
('cap_dm_exec_requests', 'sys.dm_exec_requests'),
('cap_dm_exec_sessions', 'sys.dm_exec_sessions')
go
------------------------------------------------
-- Cleanup previous execution
------------------------------------------------
drop procedure if exists spCleanup
go
create procedure spCleanup
as
begin
set nocount on
declare @target sysname
select \* into #captureTables from captureTables
select TOP 1 @target = target from #captureTables
while (@target is not null)
begin
exec('drop table if exists ' + @target)
delete from #captureTables where target = @target
set @target = null
select TOP 1 @target = target from #captureTables
end
end
go
------------------------------------------------
-- Capture data
------------------------------------------------
drop procedure if exists spCapture
go
create procedure spCapture
as
begin
set nocount on
declare @captureTime sysname= convert(sysname, GetUTCDate(), 109)
declare @target sysname
declare @source sysname
-- Avoid cursor overhead
--
select \* into #captureTables from captureTables
select TOP 1 @target = target, @source = source from #captureTables
while (@target is not null)
begin
print 'Processing ' + @target
if (object\_id(@target) is null)
begin
exec('select cast(''' + @captureTime + ''' as datetime) as \[CaptureTime\], \* into ' + @target + ' from ' + @source)
end
else
begin
exec('insert into ' + @target + ' select cast(''' + @captureTime + ''' as datetime) as \[CaptureTime\], \* from ' + @source)
end
-- exec('select \* from ' + @target)
delete from #captureTables where target = @target
set @target = null
select TOP 1 @target = target, @source = source from #captureTables
end
end
go
exec spCleanup
go
exec spCapture
waitfor delay '00:00:05'
exec spCapture
go
------------------------------------------------
-- Report outs
------------------------------------------------
print 'System Info'
go
select
CaptureTime,
cpu_ticks - LAG(cpu_ticks,1) OVER (ORDER BY CaptureTime) as cpu_ticks_delta,
ms_ticks - LAG(ms_ticks,1) OVER (ORDER BY CaptureTime) as ms_ticks_delta,
committed_kb - LAG(committed_kb,1) OVER (ORDER BY CaptureTime) as committed_kb_delta,
max_workers_count - LAG(max_workers_count,1) OVER (ORDER BY CaptureTime) as max_workers_count_delta,
process_kernel_time_ms - LAG(process_kernel_time_ms,1) OVER (ORDER BY CaptureTime) as process_kernel_time_ms_delta,
process_user_time_ms - LAG(process_user_time_ms,1) OVER (ORDER BY CaptureTime) as process_user_ms_delta
, *
from cap_dm_os_sys_info
go
print 'Scheduler Info'
go
select
CaptureTime,
scheduler_address,
context_switches_count - LAG(context_switches_count,1) OVER (PARTITION BY scheduler_address ORDER BY CaptureTime) as context_switch_delta,
preemptive_switches_count - LAG(preemptive_switches_count,1) OVER (PARTITION BY scheduler_address ORDER BY CaptureTime) as preemptive_switch_delta,
current_tasks_count - LAG(current_tasks_count,1) OVER (PARTITION BY scheduler_address ORDER BY CaptureTime) as current_tasks_count_delta,
runnable_tasks_count - LAG(runnable_tasks_count,1) OVER (PARTITION BY scheduler_address ORDER BY CaptureTime) as runnable_tasks_count_delta,
yield_count - LAG(yield_count,1) OVER (PARTITION BY scheduler_address ORDER BY CaptureTime) as yield_count_delta,
total_cpu_usage_ms - LAG(total_cpu_usage_ms,1) OVER (PARTITION BY scheduler_address ORDER BY CaptureTime) as total_cpu_usage_ms_delta,
*
from cap_dm_os_schedulers
order by 8 desc
go
print 'Thread Info'
go
select
CaptureTime,
os_thread_id,
kernel_time - LAG(kernel_time,1) OVER (PARTITION BY os_thread_id ORDER BY CaptureTime) as kernel_time_delta,
usermode_time - LAG(usermode_time,1) OVER (PARTITION BY os_thread_id ORDER BY CaptureTime) as user_time_delta,
*
from cap_dm_os_threads
order by 4 desc
go
print 'Task Info'
go
select
CaptureTime,
task_address,
context_switches_count - LAG(context_switches_count,1) OVER (PARTITION BY task_address ORDER BY CaptureTime) as context_switch_delta,
*
from cap_dm_os_tasks
order by 3 desc
go
print 'Worker Info'
go
select
CaptureTime,
worker_address,
context_switch_count - LAG(context_switch_count,1) OVER (PARTITION BY worker_address ORDER BY CaptureTime) as context_switch_delta,
quantum_used - LAG(quantum_used,1) OVER (PARTITION BY worker_address ORDER BY CaptureTime) as quantum_used_delta,
*
from cap_dm_os_workers
order by 4 desc
go
print 'Requests Info'
go
select
CaptureTime,
session_id,
request_id,
reads - LAG(reads,1) OVER (PARTITION BY session_id, request_id ORDER BY CaptureTime) as reads_delta,
logical_reads - LAG(logical_reads,1) OVER (PARTITION BY session_id, request_id ORDER BY CaptureTime) as logical_reads_delta,
writes - LAG(writes,1) OVER (PARTITION BY session_id, request_id ORDER BY CaptureTime) as writes_delta,
cpu_time - LAG(cpu_time,1) OVER (PARTITION BY session_id, request_id ORDER BY CaptureTime) as cpu_time_delta,
*
from cap_dm_exec_requests
order by 7 desc
go
print 'Sessions Info'
go
select
CaptureTime,
session_id,
cpu_time - LAG(cpu_time,1) OVER (PARTITION BY session_id ORDER BY CaptureTime) as cpu_time_delta,
total_scheduled_time - LAG(total_scheduled_time,1) OVER (PARTITION BY session_id ORDER BY CaptureTime) as total_scheduled_time_delta,
reads - LAG(reads,1) OVER (PARTITION BY session_id ORDER BY CaptureTime) as reads_delta,
logical_reads - LAG(logical_reads,1) OVER (PARTITION BY session_id ORDER BY CaptureTime) as logical_reads_delta,
writes - LAG(writes,1) OVER (PARTITION BY session_id ORDER BY CaptureTime) as writes_delta,
*
from cap_dm_exec_sessions
order by 4 desc
go
print 'Dispatcher Info'
go
select
d.CaptureTime,
dispatcher_pool_address,
d.task_address,
items_processed - LAG(items_processed,1) OVER (PARTITION BY dispatcher_pool_address, d.task_address ORDER BY d.CaptureTime) as items_processed_delta,
w.quantum_used - LAG(w.quantum_used,1) OVER (PARTITION BY dispatcher_pool_address, d.task_address ORDER BY d.CaptureTime) as quantum_used_delta,
tt.kernel_time - LAG(tt.kernel_time,1) OVER (PARTITION BY dispatcher_pool_address, d.task_address ORDER BY d.CaptureTime) as kernel_time_delta,
tt.usermode_time - LAG(tt.usermode_time,1) OVER (PARTITION BY dispatcher_pool_address, d.task_address ORDER BY d.CaptureTime) as usermode_time_delta,
*
from cap_dm_os_dispatchers d
inner join cap_dm_os_tasks t on t.task_address = d.task_address and t.CaptureTime = d.CaptureTime
inner join cap_dm_os_workers w on w.task_address = t.task_address and w.CaptureTime = t.CaptureTime
inner join cap_dm_os_threads tt on tt.thread_address = w.thread_address and w.CaptureTime = tt.CaptureTime
order by 7 desc
go
print 'Dispatcher Pool Info'
go
select
CaptureTime,
dispatcher_pool_address,
dispatcher_count - LAG(dispatcher_count,1) OVER (PARTITION BY dispatcher_pool_address ORDER BY CaptureTime) as dispatcher_count_delta,
dispatcher_waiting_count - LAG(dispatcher_waiting_count,1) OVER (PARTITION BY dispatcher_pool_address ORDER BY CaptureTime) as dispatcher_waiting_count_delta,
queue_length - LAG(queue_length,1) OVER (PARTITION BY dispatcher_pool_address ORDER BY CaptureTime) as queue_length_delta,
*
from cap_dm_os_dispatcher_pools
order by 5 desc
go
print 'Mappings'
select
ss.*, r.*, w.*, ts.*, t.*, s.*
from cap_dm_os_schedulers s
inner join cap_dm_os_workers w on w.scheduler_address = s.scheduler_address and s.CaptureTime = w.CaptureTime
inner join cap_dm_os_threads t on t.thread_address = w.thread_address and t.CaptureTime = w.CaptureTime
inner join cap_dm_os_tasks ts on ts.task_address = w.task_address and ts.CaptureTime = w.CaptureTime
inner join cap_dm_exec_requests r on r.task_address = ts.task_address and r.CaptureTime = ts.CaptureTime
inner join cap_dm_exec_sessions ss on ss.session_id = r.session_id and ss.CaptureTime = r.CaptureTime
order by ss.session_id, r.request_id, ss.CaptureTime
go
Trimming Behavior
The default idle worker timeout is 1500 seconds (BOX, SQLVM.) However, various deployments may adjust the timeout threshold.
Sub-Core set to 120 seconds
Data Warehouse DMS set to 120 seconds
Data Warehouse Engine set to 120 seconds
SQLDB instance set to 900 sec
...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.