Lesson Learned #301:How to check the performance of my database? Resource Consumption and Wait Stats
Published Jan 22 2023 12:15 PM 4,136 Views

In several service requests our customers asked how to check the performance of a database. During this discussion we reached to discuss why is important to check Wait Stats, for example, CXPACKET, SOS_SCHEDULING_YIELD, PAGEIOLATCH, PAGELATCH, etc.. among others, because that means what is waiting for SQL Server to execute the workload.

 

In a performance scenario, normally, I used to review two main things:

 

  • Resources consumption:
    • High CPU: might means that we have Conversion Implicit, missing indexes, a lot of workers working in parallel, a lot of single process running concurrently, bad execution plan based on statistics, parameter sniffing, etc..
    • High DataIO: might means that we have missing indexes, reading too much information, not enough memory to manage all the queries, etc..
    • High LogIO: might means that we have a lot of operations updating data and writing data in the transaction log. 

 

  • Wait stats.
    • We have plenty of wait stats, but, for me it is very important to know how SQL Server works to understand it.
    • SQL server is a multi-threaded application and is responsible to manage in a cooperative mode (almost) and their synchronizations. Before SQL Server 2005 all the work was based on Windows multi-threaded but in SQL SERVER 2005 was introduced a new feature called SQLOS with a new design to scheduling the different tasks that SQL Server.
    • Every time that a query is submitted, SQL Server prepares a task to execute, with operations to be done. 
      • To scheduling all these process, SQL Server has a list of workers. 
      • SQL Server has a special queue with workers to assign individually the task (depending on the number of vCores that you have will more or less long).
      • In thie process to assign a worker, we have our first wait stats, based on the classification of Running, Runnable or suspended (among others) that indicates if the worker is running, is pending to be assigned to a worker or if suspended waiting for resources or other. 
      • Once SQLOS is assign a worker, it will wait for a CPU (scheduler in SQL Server) to execute the part of the operator the TSQL command needs to execute, let's say Table Scan, Sorting data, etc.. 
    • The most interesting part of scheduling is when a worker has to wait for a resource before it can continue: 
      • Suppose that it is needed to read a page that it is not in memory, so we need to run an asynchronous physical I/O and then has to wait, off the processor, until the I/O completes. 
      • Other example, is when it is needed to acquire a share lock on a row to read it, but another process already holds a conflicting exclusive lock while it is updating the row. 
      • In all these situations, when that happens, SQL Server makes a note of why the thread had to wait and this is called the wait type. Some examples of this are: 
        • When waiting for a page to be read into memory the wait type is PAGEIOLATCH_SH (if the thread is waiting for a page that it will change, the wait type is PAGEIOLATCH_EX). 
        • When waiting for a share lock on a row, the wait type is LCK_M_S (lock-mode-share) 
      • The most important thing is SQL Server keeps track of how long has been the wait. This is called the resource wait time and is usually just known as the wait time
      • The overall set of metrics of how many threads have waited for which resources and for how long on average is called wait statistics.
      • We have a special scenario, is when the task needs to execute the query in parallel.
        • In this situation, we need to create several tasks to execute part of the operator needed (for example, scanning a table) and report the data to the main task.
        • Every task will have a worker to execute and the main wait stat (among others will be CXPACKET). For this reason, when we have questions from our customer what is the best number to limit how many tasks will run in parallel. It is dificult, because, running in parallel is a good approach to reduce the time, but, if all the workers are in used or all CPU (schedulers) are busy, most probablly our SQL Server will take time to execute the query. 
    • In summary, to know much time are taking these wait stats to perform all operations is a good point to review. Of course, It is not bad to have a Wait Stat, that means that your SQL Server is working. Specific types of wait times during query execution can indicate bottlenecks or stall points within the query. Similarly, high wait times, or wait counts server wide can indicate bottlenecks in interaction query interactions. The basic idea is why have to stop and wait, and what they’re waiting for, besides, other troubleshooting stuff, can point us in a direction to take. 

 

Jose_Manuel_Jurado_0-1674417385244.png

 

I use this TSQL to obtain more data about task, workers and schedulers.

 

 

select tasks.task_address, parent_task_address, task_state, 
exec_context_id,
tasks.scheduler_id, work.worker_address, tasks.session_id, req.status, req.command, 
blocking_session_id, wait_type, wait_time, req.last_wait_type, cpu_time, dop, 
sched.scheduler_address, sched.parent_node_id, sched.status, sched.is_online, sched.active_worker_address,
substring
(REPLACE
(REPLACE
(SUBSTRING
(ST.text
, (req.statement_start_offset/2) + 1
, (
(CASE statement_end_offset
WHEN -1
THEN DATALENGTH(ST.text)
ELSE req.statement_end_offset
END
- req.statement_start_offset)/2) + 1)
, CHAR(10), ' '), CHAR(13), ' '), 1, 512) AS statement_text, * from sys.dm_exec_requests req
join sys.dm_os_workers work on req.task_address = work.task_address 
join sys.dm_os_tasks tasks on req.session_id = tasks.session_id 
join sys.dm_os_schedulers sched on sched.scheduler_id = tasks.scheduler_id
CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) as ST
where req.status <> 'background' and req.session_id<> @@spid
order by req.session_id,tasks.scheduler_id, work.worker_address, req.status

 

 

Updated .- (26 Jan) - We recorded a video about these topics:

 

 

Enjoy!

 

 

Version history
Last update:
‎Jan 26 2023 06:57 AM
Updated by: