Lesson Learned #404: Understanding and Addressing 100% Worker Utilization
Published Jul 20 2023 07:53 AM 5,863 Views

Introduction:

 

Worker utilization is a critical aspect of Azure SQL Database performance. When the percentage of workers reaches 100%, it can lead to database unresponsiveness, impacting application performance and user experience. In this comprehensive article, we will explore the reasons behind 100% worker utilization and provide detailed solutions to optimize and enhance the performance of your Azure SQL Database.

 

Understanding Worker Utilization in Azure SQL Database:

 

Workers in Azure SQL Database are responsible for executing queries and processing various tasks. Each worker represents a logical thread that can handle a specific request. When the worker utilization reaches 100%, it indicates that all workers are occupied, and any new requests will have to wait until a worker becomes available.

 

Reasons for 100% Worker Utilization:

 

  1. Incorrect MAXDOP Setting: The Maximum Degree of Parallelism (MAXDOP) determines the maximum number of processors that can be used to execute a query in parallel. An incorrect MAXDOP setting can lead to inefficient resource utilization and a higher worker workload.

  2. Excessive Concurrent Connection Attempts: A large number of simultaneous connection attempts can overwhelm the system, depleting available workers and impacting performance.

  3. Numerous Concurrent Requests with Slow Responses: Slow queries and long-running requests can accumulate and increase worker utilization.

  4. Automatic Update Statistics: Azure SQL Database has a feature called AUTO_UPDATE_STATISTICS that automatically updates query statistics for tables when certain thresholds are met. Outdated statistics can lead to poor query performance, causing excessive worker utilization.

  5. Multiple Blocking Issues: If there are multiple blocking processes in the database, it can create a chain reaction, leading to a high number of waiting requests and reaching 100% worker utilization.

 

Solution 1: Optimizing MAXDOP Setting for Parallelism:

 

Optimizing the MAXDOP setting is crucial for achieving efficient parallelism and avoiding 100% worker utilization. To achieve this, follow these steps:

 

Step 1: Monitor MAXDOP Utilization with DMVs: 

 

 

SELECT TOP 10
    qs.execution_count,
    qs.total_worker_time,
    qs.total_dop,
    qs.last_dop,
    qs.min_dop,
    qs.max_dop,
    st.text AS query_text,
    qp.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
ORDER BY qs.total_worker_time DESC;

 

 

Step 2: Analyze MAXDOP Utilization Results:

 

Review the output to identify queries with high total_worker_time and analyze their parallelism settings (total_dop, last_dop, min_dop, max_dop). Adjust the MAXDOP setting for specific queries using the OPTION (MAXDOP <value>) hint or server-level configuration based on the performance and resource utilization.

 

Solution 2: Managing Concurrent Connection Attempts:

 

To handle concurrent connection attempts, consider the following strategies:

Step 1: Implement a connection pooling mechanism in your application to efficiently manage and reuse connections.

Step 2: Configure connection timeouts and retries to avoid overwhelming the system with excessive connection requests.

 

Solution 3: Optimizing Slow Queries and Requests:

 

To optimize slow queries and requests, follow these steps:

Step 1: Identify and analyze slow-performing queries using query execution plans and DMVs like sys.dm_exec_requests.

Step 2: Optimize queries by adding appropriate indexes, rewriting queries, or leveraging query hints. Step 3: Implement caching mechanisms to reduce redundant queries and improve response times.

 

Solution 4: Monitoring and Addressing Blocking Issues:

 

To address multiple blocking issues, consider the following steps:

Step 1: Monitor the database for blocking using DMVs like sys.dm_os_waiting_tasks and sys.dm_exec_requests.

Step 2: Identify the root cause of the blocking and make necessary adjustments, such as optimizing indexes or redesigning transactions.

 

Using ostress.exe to Simulate High Worker Utilization:

Example 1: Simulating Concurrent Connection Attempts

 

 

ostress.exe -S <server_name> -d <database_name> -U <username> -P <password> -Q "WAITFOR DELAY '00:00:10'" -r 100 -n 200

 

 

If we need to know all the details, here's the query to get all the details of active connections, executed commands, client IP addresses, and application names in Azure SQL Database using Dynamic Management Views (DMVs):

 

 

SELECT
    s.session_id,
    c.client_net_address AS [Client IP Address],
    s.host_name AS [Host Name],
    s.program_name AS [Application Name],
    c.connect_time AS [Connection Time],
    r.command AS [Command],
    t.text AS [SQL Text]
FROM sys.dm_exec_sessions AS s
JOIN sys.dm_exec_connections AS c ON s.session_id = c.session_id
CROSS APPLY sys.dm_exec_sql_text(c.most_recent_sql_handle) AS t
CROSS APPLY sys.dm_exec_requests AS r ON s.session_id = r.session_id
WHERE s.is_user_process = 1
  AND s.status = 'running'
ORDER BY s.session_id;

 

 

This query will provide you with a list of all active connections, including details such as the client's IP address, host name, application name, connection time, the current command being executed, and the SQL text of the most recent query in each session.

 

With this information, you can identify and monitor active connections and their activity, gaining a more detailed insight into the performance and worker utilization in your Azure SQL Database.

 

Example 2: Simulating Slow Queries with Delay

 

 

ostress.exe -S <server_name> -d <database_name> -U <username> -P <password> -Q "WAITFOR DELAY '00:00:05'; SELECT * FROM YourTable" -r 50 -n 150

 

 

Using the previous query to obtain connections, request, etc.. we could identify the issue. 

 

Example 3: Obtain blocking issues with Delay

 

 

SELECT
    r.session_id,
    r.request_id,
    r.command,
    r.status,
    r.wait_type,
    r.wait_time,
    r.last_wait_type,
    r.wait_resource,
    r.cpu_time,
    r.total_elapsed_time,
    r.logical_reads,
    r.reads,
    r.writes,
    r.transaction_isolation_level,
    r.row_count,
    t.text AS [SQL Text],
    c.client_net_address AS [Client IP Address],
    s.host_name AS [Host Name],
    s.program_name AS [Application Name],
    s.login_name AS [Login Name],
    r.blocking_session_id AS [Blocking Session ID],
    br.blocking_session_id AS [Blocked Session ID],
    blocking_text.text AS [Blocking SQL Text]
FROM sys.dm_exec_requests AS r
CROSS APPLY sys.dm_exec_sql_text(r.plan_handle) AS t
LEFT JOIN sys.dm_exec_connections AS c ON r.session_id = c.session_id
LEFT JOIN sys.dm_exec_sessions AS s ON r.session_id = s.session_id
LEFT JOIN sys.dm_exec_requests AS br ON r.blocking_session_id = br.session_id
OUTER APPLY sys.dm_exec_sql_text(br.plan_handle) AS blocking_text
ORDER BY r.total_elapsed_time DESC;

 

 

Monitoring Worker Utilization with DMVs:

Azure SQL Database provides Dynamic Management Views (DMVs) to monitor worker utilization:

  1. sys.dm_os_schedulers: Provides information about schedulers managing workers' execution.

  2. sys.dm_exec_requests: Contains details about each request or command running in the database, including worker utilization.

  3. sys.dm_exec_sessions: Offers insights into active sessions connected to the database, along with worker utilization details.

 

Conclusion:

 

Optimizing worker utilization in Azure SQL Database is crucial for maintaining a responsive and high-performance database. By setting an appropriate MAXDOP value, managing concurrent connection attempts, optimizing slow queries, addressing blocking issues, and simulating scenarios using ostress.exe, you can prevent worker utilization from reaching 100%. Regularly monitoring worker utilization using DMVs will help you identify bottlenecks and make informed decisions to improve overall performance. 

 

Enjoy!

1 Comment
Version history
Last update:
‎Jul 20 2023 08:09 AM
Updated by: