Tasks, Workers, Threads, Scheduler, Sessions, Connections, Requests ; what does it all mean?
Published Feb 10 2019 05:21 PM 17.8K Views
Microsoft

First published on MSDN on Dec 13, 2012
With this meditation I attempt to explain what some of the more common concepts that get used with SQL Server thread management and scheduling are.

Parable: There was an all-powerful, but humble and benign Master, whom the workers revered and humbly served. The master accepted requests from other kingdoms and graciously agreed to grant all of them. To do so the Master assigned tasks to his workers (servants) who completed them cooperating with each – allowing each other graciously to approach the Master one at a time.


Components:

 

Scheduler (SOS Scheduler)– the object that manages thread scheduling in SQL Server and allows threads to be exposed to the CPU (described in sys.dm_os_schedulers ). This is the all-powerful but benign and graceful master whom everyone abides.  He does not control things but lets the workers work with each other and relies on their cooperation (co-operative scheduling mode). Each scheduler /master (one per logical CPU) accepts new tasks and hands them off to workers. SOS Scheduler allows one worker at a time to be exposed to the CPU.

Task –a task represents the work that needs to be performed ( sys.dm_os_tasks ). A task contains one of the following events: query (RPC event or Language event), a prelogin (prelogin event),  a login (connect event), a logout  (disconnect event), a query cancellation (an Attention event), a bulk load (bulk load event), a distributed transaction (transaction manager event). A task is what the Master is about – it is what defines its existence. Note these are tracked at the SOS scheduler layer (thus dm_OS_tasks)

Worker (worker thread) – This is the logical SQL Server representation of a thread (think of it as a wrapper on top of the OS thread). It is a structure within the Scheduler which maintains SQL Server-specific information about what a worker thread does. sys.dm_os_workers . Workers are the humble servants who carry out the task assigned to them by the Master (scheduler).

Thread – this is the OS thread sys.dm_os_threads that is created via calls like CreateThread() / _beginthreadex() . A Worker is mapped 1-to-1 to a Thread.

Request is the logical representation of a query request made from the client application to SQL Server ( sys.dm_exec_requests ). This query request has been assigned to a task that the scheduler hands off to a worker to process. This represents query requests as well as system thread operations (like checkpoint, log writer, etc); you will not find login, logouts, attentions and the like here. Also, note that this is a representation at the SQL execution engine level (thus dm_EXEC_requests) not at the SOS Scheduler layer.

Sessions – when the client application connects to SQL Server the two sides establish a "session" on which to exchange information. Strictly speaking a session is not the same as the underlying physical connection, it is a SQL Server logical representation of a connection. But for practical purposes, you can think of this as being a connection (session =~ connection), see sys.dm_exec_sessions . This is the old SPID (session process id) that existed in SQL Server 2000 and earlier. In the case of system sessions (internal sessions spawned by SQL Server like LazyWriter, Checkpoint, Log Writer, Service Broker, etc), no external physical connection is mapped to the session. Only a session_id exists. Typically reserved for session IDs < 50, but can be a higher value.
You may sometimes notice a single session repeating multiple times in a DMV output. This happens because of parallel queries. A parallel query uses the same session to communicate with the client, but on the SQL Server side multiple worker threads are assigned to service this query request. So if you see multiple rows with the same session ID, know that the query request is being serviced by multiple threads.  A session_id (SPID) is used to identify the work performed inside SQL Server. For example, you may want to find out which session_id is executing a query, or which session_id has its query cancelled.  Some of the properties of a sesssion include login time, last request (query) time, CPU consumed , memory used, and  total elapsed time used by a query or set of queries on this session, user name, SET options configured for this session. 

Connections – this is the actual physical connection established at the lower protocol level with all of its characteristics sys.dm_exec_connections . There is a 1:1 mapping between a Session and a Connection. A connection has some of the following properties - protocol (Shared Memory, TCP, Named Pipes), authentication type (NTLM, Kerberos), Encryption (on or off), Network packet size, client IP address and port. These are all physical properties of the connection. A connection_id in sys.dm_exec_connections is a GUID and is used to uniquely identify that physical connection. You typically won't use a connection_id to identify which session is executing a query; a session_id is used instead. 

 

 

 

Interconnection between the Components:

 

A client application creates a physical connection to SQL Server. Then the application sends a pre-login request and a task is created and assigned to a worker to fulfill. Once the server and client finish the pre-login process, a login request is sent and another task is formed and handed off to a worker thread. Once the login is completed, SQL Server creates a session that represents this logical connection where it will exchange information with the client. When the client application sends a query request (or DTC or bulk load), the server again creates a task and assigns it to a worker thread for completion. If the query is cancelled in the middle of execution, for some reason, the server will receive an Attention request upon which the IOCP listener will mark a bit that the query is cancelled and the worker that was running the query would stop executing when it sees the bit. If the query is allowed to complete, on the other hand, and the client application is done, it can send a disconnect or logout request which again is packaged as a task and serviced by a worker.

 

Namaste!

 

Joseph

10 Comments
Copper Contributor
Few questions - 1. DMV data for Connection & Session appear same since those are 1:1 mapped, unless uniquely used in Troubleshooting/Optimisation Process. Can you share any instance wherein we could use these 2 in unison or separately? 2. In which part of Troubleshooting/Optimisation Process are these DMVs used. Any lead is appreciated and would set a starting point for further understanding. Thank You! --In 'thoughts'...
Microsoft

@LonelyRogue - thank you for your question. It prompted me to add more clarifications to the Session and Connection descriptions in order to differentiate them. Please see if those help and provide feedback. Thank you for helping make this better.

Copper Contributor

Hello @Joseph Pilov ,

 

This is a wonderful write-up and it has brought me much closer to understanding the fundamentals of Sessions vs Connections. While reading your original post, I was cross referencing the Microsoft Docs and I was wondering if you could help me better understand one of your points.

 

Regarding a session being potentially broken up into multiple rows due to what you called a "parallel query", would you happen to have some example which I can run that will result in multiple sessions in the sessions DMV for demonstration purposes? Is your term "parallel query" the same as "parallel processing" on a single query statement, or something different?

 

Also, I just wanted to mention that in my research and cross-referencing I found that sys.dm_exec_sessions has a potential to have a one-to-zero relationship to connections, meaning a session does not necessarily have to be associated with a connection. This wasn't mentioned in your post but I thought it would make a great addition here. The only example I could locate of this are what are referred to as "System Sessions". I don't understand what these are but based on the name, I assume these are sessions required for the system running the SQL Server instance to operate that are separate from any user connection.

 

Here's a query to view those: 

SELECT *
FROM sys.dm_exec_sessions 
WHERE is_user_process = 0;

Thanks for your great post!

 

 

 

 

Microsoft

@Database_Nova , thanks for the feedback. It helped me clarify the points you were asking about - I appreciate it. See the updated write-up.

 

Yes, parallel queries is the same as parallel processing done by SQL Server. You submit a query and if it is "too expensive" from optimizer point of view AND if it qualifies for parallel processing, SQL Server will break up the work among multiple threads. As a simple example, let's say you submit a query against a huge table and want to get all rows and perform aggregation  select sum (sale_amount) from InternetSalesTbl. SQL Server could take that clustered index scan and split it among 8 threads, with each of the threads scanning a part of the table , in parallel with the other threads. That way you get the work done ideally 8 times faster (not always the reality because there is overhead to parallelism and because distribution of the data among all threads may not be an even 1/8 of the table - it is driven by statistics). The same MicrosoftDoc I quoted above provides an example for you to look at.

Yes, good point- system sessions  (those created by SQL Server for internal operations like checkpoint, lazy writer, and many others) are not invoked by an external client application. In other words no external connection was open to SQL Server to get those started. Therefore your statement about 1:0 mapping is correct - a session exists, but no corresponding connection to it. Thanks for reminding me of this scenario. I have now captured it in the write up above. Thanks for helping make this write up better for the benefits of many.

Your query example is accurate, but the older sysprocesses system view provides a much better insight as to what those system sessions are doing. Here is a query to help visualize that

 

select spid, kpid, lastwaittype, login_time 
from sysprocesses where spid < 50
Copper Contributor

@Joseph Pilov A followup question regarding "parallel processing" and multiple session_id's in the DMV's... you stated "You may sometimes notice a single session repeating multiple times in a DMV output"

 

In what DMS's does that happen?

Then it's kind of hard to figure out the differences between two samples of any DMV's (for working out differences in incremental counters, you need some kind of "uniqueness" or a "logical" Primary Key)

So what can be used as a "unique" column set for those DMV's

 

If there is no "uniqueness" for session_id's that are using DOP (kind of "family_id", or dop_id)

And several records are returned from the DMV's (with the same session_id)

* are all counters the "same" (in other words: do the "worker" report it's counter-statistics like "logical_reads" to the "parent/owning" session_id

    - if all extra/duplicate DOP-session_id has the same values... why are they presented?

* If the extra/duplicate DOP-session_id entries contains counter-statistics that are *individual* for each of the "workers"... Should I just "sum" all incremental counters...

 

I can't just get my head around how DOP "workers" are reported in the DMV's
Any good paper out there that discusses this?

 

 

Another question which is probably "off topic" for this thread is: when/what counters are updated in DMV's

* At the end of SQL statement execution

* or: in real time -- meaning that I can see how many logical reads (etc) the query/session/xxx has been done *right now* (so I can calculate counter per second while the SQL Statement is executing)

 

Microsoft

 

Hello @goran_schwarz

Thanks for your question (s). I think this sample output will answer most (if not all of them). 

The following is an output of a parallel query (actually ALTER INDEX job) that shows multiple rows for a single session_Id. This was a join between sysprocesses and sys.dm_exec_requests on one of my colleague's SQL Server machines while he was running a re-indexing job against his AdventureWorks database. Actually this output comes from queries generated by this PerfrStats script and is trimmed for brevity

 

 

session_id request_id  ecid         task_state      open_trans  request_cpu_time request_logical_reads request_reads        request_writes        tran_name       scheduler_id command          program_name                                       
---------- ----------- -----------  --------------- ----------- ---------------- --------------------- -------------------- --------------------  ---------------------------- ---------------  -------------------------------------------------- 
        51           0           3  RUNNING                   2           226549               5860461                 1427              1086421  ALTER INDEX                1 ALTER INDEX      Microsoft SQL Server Management Studio - Query     
        51           0           7  RUNNABLE                  2           226549               5860461                 1427              1086421  ALTER INDEX                3 ALTER INDEX      Microsoft SQL Server Management Studio - Query     
        51           0           6  RUNNING                   2           226549               5860461                 1427              1086421  ALTER INDEX                2 ALTER INDEX      Microsoft SQL Server Management Studio - Query     
        51           0           0  SUSPENDED                 2           226549               5860461                 1427              1086421  ALTER INDEX                6 ALTER INDEX      Microsoft SQL Server Management Studio - Query     
        51           0           4  RUNNING                   2           226549               5860461                 1427              1086421  ALTER INDEX                0 ALTER INDEX      Microsoft SQL Server Management Studio - Query     
        52           0           0  SUSPENDED                 0               36                    48                    0                    0                             0 WAITFOR          SQLCMD                                             
                                                                                                                                                                                                                                                 
																																																																																																																		   
 



Note that session_id=51 shows up multiple times in the output as separate rows - one for each thread servicing this request (strictly speaking in this output there were more threads but a filter was applied so some do not show up). You will observe this even if you do a simple SELECT * FROM sys.dm_exec_requests DMV while a parallel query is running.

One of the more useful columns in sys.sysprocesses system view for me personally is the ECID. ECID shows the execution context ID (an auto-generated number) for a request. When a query runs serially, the ECID is always 0. When a query runs multiple threads - parallelly - then the parent ECID is 0 but all the children threads spawned have numbers > 0. In this case actually not all parallel threads are showing because of some WHERE clause filters but you can see that ECID =7 is the highest number here. This means that the request to rebuild an index used at least 8 threads running in parallel (0 through 7).

Note also how tasks assigned to each thread had a different state - some are running, some are waiting to be scheduled (Runnable), some are suspended. Finally, notice how each of the threads are scheduled on a different scheduler_id (CPU).

 

There are several other useful columns in these two DMVs that can help you with identifying prallelism that I want to turn your attention to. One is parallel_worker_count in sys.dm_exec_requests DMV. The other one is kpid column in sysprocesses, which identifies the Windows thread ID that is executing this part of the task. So with these, you can build a query that meets your needs.

 

Finally calling your attention to the request_cpu_time, request_logical_reads, request_reads, request_writes in the above output to help answer your additional questions. These are aliased column names of columns you can find in both sys.dm_exec_requests and sys.sysprocesses to measure ongoing CPU, Reads, Writes, etc. 

 

Copper Contributor

Hi Joseph, I'm slightly confused by your use of the terms Worker and Worker Thread interchangeably. Are they one and the same thing?

Microsoft

@YaHozna , yes they are one and the same. I pointed this out in the definition of a worker  "Worker (worker thread)" , but I guess it is confusing. Hope this makes sense. Thanks for your interest in the article

Copper Contributor

Most lucid explanation of how things operate under the hood. 

 

JAT - System SPID's no longer are bound by the < 50 rule anymore as explained in the SystemSessions URL thread by Nova.

 

regards,

Anurag.

Microsoft

@AnuragBhattacharjee , thanks for your comment. Indeed system session_ids can be greater than 50, but typically it is below 50. Thanks for the comment though - I will use it to make that clarification in the text.

Version history
Last update:
‎Jan 25 2021 07:43 AM
Updated by: