Improving concurrency of asynchronous statistics update
Published Jun 16 2020 07:45 AM 13.6K Views
Microsoft

Executive summary

 

In Azure SQL Database and Azure SQL Managed Instance, the background process that updates statistics asynchronously can now wait for the schema modification lock on a low priority queue. This improves concurrency for workloads with frequent query plan (re)compilations.

 

New behavior is enabled with the ASYNC_STATS_UPDATE_WAIT_AT_LOW_PRIORITY database-scoped configuration

 

Automatic statistics update – synchronous vs. asynchronous

 

By default, the SQL Server database engine updates statistics automatically as needed to reflect changes in the underlying data, and help the query optimizer generate more optimal plans. When a query plan is compiled, if existing statistics are considered out-of-date (stale), new statistics are collected and written to the database metadata. By default, this happens synchronously with query execution, therefore the time to collect and write new statistics is added to the execution time of the query being compiled.

 

While up-to-date statistics often improve query plan quality, the extra time added to some query executions due to statistics update may be undesirable, particularly in transactional workloads with short queries, where updating statistics may take longer than query execution itself. For this reason, the SQL Server database engine also supports an option to update statistics asynchronously. When the database option AUTO_UPDATE_STATISTICS_ASYNC is set to ON, query optimizer proceeds to compile and execute the query even if statistics are considered stale; however, stale statistics are then updated on a background thread asynchronously, so that future query executions starting after this asynchronous process has completed can benefit from up-to-date statistics.

 

Potential concurrency impact of asynchronous statistics update

 

While asynchronous statistics update eliminates occasional spikes in query duration due to statistics update, it can sometimes cause a different, concurrency-related, problem, particularly for workloads where query plans are frequently compiled or recompiled. The mechanism in play here is non-trivial, and deserves a closer look.

 

When a query plan is compiled, the process places schema stability (Sch-S) locks on all statistics metadata objects used by the query optimizer, to ensure that they are not modified during compilation. If, during compilation, the same or a different process finds that one of these statistics is stale, it will trigger an asynchronous statistics update, using a separate (background) process. That process will scan data to build new statistics, and will then attempt to acquire a schema modification (Sch-M) lock on the statistics metadata object to update it in the database. But this background process can get blocked by a query that is already holding the Sch-S lock on the same statistics metadata object, because its plan is being compiled at that time. At this point, new queries being compiled may also need to use the same statistics and attempt to place Sch-S locks on the same object, but they would be blocked behind the background process waiting for the Sch-M lock. A blocking chain would form as the result, limiting application concurrency, as shown in Figure 1.

 

wlp-disabled.png

Figure 1. An illustration of a blocking chain forming when asynchronous statistics update is waiting for a Sch-M lock at normal priority.

 

The impact of this blocking chain could be severe, especially with frequent (re)compilations and statistics updates. For some workloads we have seen, blocking would not resolve for several minutes.

 

An example in the Appendix demonstrates this scenario.

 

Asynchronous statistics update with low priority wait

 

This problem can now be fixed in Azure SQL Database and Azure SQL Managed Instance by enabling the ASYNC_STATS_UPDATE_WAIT_AT_LOW_PRIORITY database-scoped configuration. This modifies the behavior of the background update statistics process to wait for the Sch-M lock at low priority. Low priority here means that this request will not acquire the Sch-M lock on the statistics metadata object until all other processes holding incompatible locks on the same object release their locks. Thus, the background process will not update statistics until all running queries finish compilation and release their Sch-S locks (see Figure 2). The low priority mechanism used here is the same as what is used in several other DDL commands in SQL Server, such as online index rebuild and partition switching. The example in the Appendix demonstrates this mechanism in action.

 

wlp-enabled.png

Figure 2. An illustration of the absence of query blocking when asynchronous statistics update is waiting for Sch-M lock at low priority.

 

But what if multiple concurrent query compilations continue for a long time? This could mean that at any point, there is at least one Sch-S lock on the statistics metadata object. For some very intensive workloads we have seen, this does happen. In this case, the background statistics update could wait for a long time as well. Since there is a limited number of background threads, this could negatively impact other parts of the database engine and hold resources unnecessarily. To avoid this, the background process will time out after several minutes of waiting at low priority.

 

This necessary timeout means that in rare cases, stale statistics would not be updated. Because this can lead to suboptimal query plans, the default value of the ASYNC_STATS_UPDATE_WAIT_AT_LOW_PRIORITY database-scoped configuration is OFF.

 

Asynchronous statistics update operations that wait at low priority can be monitored using the autostats_update_async_low_priority_status extended event. This event fires when the operation starts, completes successfully, times out, or fails. The stats_update_correlation_id field can be used to correlate multiple events for the same statistics update operation.

 

On secondary replicas, changes to statistics that happen on the primary are automatically persisted as part of applying transaction log records. The ASYNC_STATS_UPDATE_WAIT_AT_LOW_PRIORITY configuration has no effect on that process. However, this configuration, if enabled on a secondary replica, applies to temporary statistics in the same fashion as it does to permanent statistics on the primary.

 

Conclusion

 

The new option to wait for lock at low priority in asynchronous statistics update solves a specific type of workload concurrency problem that has been affecting several Azure SQL and SQL Server customers. This solution is now available in Azure SQL Database and Azure SQL Managed Instance, and is planned to be included in SQL Server vNext. For customers seeing blocking chains where a background process waits for a Sch-M lock on a statistics metadata object, we recommend enabling the ASYNC_STATS_UPDATE_WAIT_AT_LOW_PRIORITY database-scoped configuration to improve workload concurrency.

 

We welcome your questions and feedback on this improvement via comments to this blog post, or via email to IntelligentQP [at] microsoft [dot] com.

 

Appendix

 

The script below shows an example of a blocking chain that could occur with asynchronous statistics update enabled, and then shows how the new option to wait for a Sch-M lock at low priority avoids it. Execute the script one step at a time.

 

Note that on instances with lock partitioning enabled, the likelihood of encountering a blocking chain becomes lower as the number of schedulers and lock partitions increases.

 

With lock partitioning enabled, the Sch-S lock held by the long-running query in Step 2 will be in a particular lock partition, say partition 4, because shared access requires a Sch-S lock in only one lock partition. When the background process updating statistics attempts to acquire a Sch-M lock, it will do that on all lock partitions one at a time, starting from lock partition 0 to avoid deadlocks. But it will get blocked once it gets to lock partition 4 where the long-running query is holding the Sch-S lock, and will not proceed to lock partitions with higher numbers.

 

When we execute the query in Step 6 that attempts to acquire a Sch-S lock on statistics metadata to compile its plan, it may be able to do so on a lock partition with a higher number than the ones where background process is holding or waiting for a Sch-M lock, i.e. it can acquire the Sch-S lock on partition 5 and above. Thus, since this is the only lock the query needs for shared access, the query would not be blocked. As the total number of lock partitions increases, the likelihood of finding a lock partition without a Sch-M lock increases too, making the blocking chain less likely to occur.

To reproduce the problem reliably, use an Azure SQL database with 1 scheduler, e.g. use GP_Gen4_1 service objective.

 

First, to set up the scenario, enable asynchronous statistics update, and make sure that the configuration to wait at low priority for asynchronous statistics update is disabled.

 

 

ALTER DATABASE CURRENT SET AUTO_UPDATE_STATISTICS_ASYNC ON;
ALTER DATABASE SCOPED CONFIGURATION SET ASYNC_STATS_UPDATE_WAIT_AT_LOW_PRIORITY = OFF;

 

 

Step 1

Create test table and statistics.

 

 

DROP TABLE IF EXISTS dbo.stats_wlp;

CREATE TABLE dbo.stats_wlp
(
c1 varchar(200) NOT NULL CONSTRAINT df_stats_wlp_c1 DEFAULT ('Hello World.')
);

CREATE STATISTICS stats_wlp_stat1 ON dbo.stats_wlp (c1);

 

 

Populate test table with 30000 rows.

 

 

INSERT INTO stats_wlp (c1)
SELECT TOP (30000) 'Hello World.' AS c1 
FROM sys.all_columns AS ac1
CROSS JOIN sys.all_columns AS ac2;

 

 

Step 2

Start a long-running query. This query places a Sch-S lock on the statistics metadata object to read statistics properties. This simulates a query taking this lock to compile its plan. Note the object_id and stats_id values for the statistics metadata object in the result set. They will be appearing in the following steps.

 

 

SELECT t1.c1, s.object_id, s.stats_id
FROM dbo.stats_wlp t1
CROSS JOIN dbo.stats_wlp t2
CROSS JOIN dbo.stats_wlp t3
CROSS JOIN dbo.stats_wlp t4
CROSS JOIN sys.stats AS s
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
WHERE s.name = 'stats_wlp_stat1';

 

Step 3

Using a separate session (query window), observe a granted Sch-S lock on the statistics metadata resource with the same object_id and stats_id, held by the long-running query. Here and below, screenshots show a subset of columns for brevity.

 

 

SELECT l.resource_type, l.resource_subtype, l.resource_description, l.request_mode, l.request_type, l.request_status, l.request_session_id, l.resource_lock_partition, r.blocking_session_id, r.command, r.status, r.wait_type
FROM sys.dm_tran_locks AS l
INNER JOIN sys.dm_exec_requests AS r
ON l.request_session_id = r.session_id
WHERE resource_type = 'METADATA'
      AND
      resource_subtype = 'STATS';

 

 

step3-1.png

Step 4

Using another separate session (query window), execute a query that will compile a new plan and trigger an asynchronous statistics update due to significant changes in data distribution (recall that we inserted 30000 rows). The query completes quickly because it does not wait for the statistics update to complete.

 

 

SELECT COUNT(1) AS cnt
FROM dbo.stats_wlp
WHERE c1 LIKE '%Hello%' 
OPTION (RECOMPILE);

 

Step 5

 

Use the query from Step 3 to look at locks again. See that now there is a background process attempting to acquire a Sch-M lock on the same statistics metadata object, with request_status set to CONVERT.

 

 

SELECT l.resource_type, l.resource_subtype, l.resource_description, l.request_mode, l.request_type, l.request_status, l.request_session_id, l.resource_lock_partition, r.blocking_session_id, r.command, r.status, r.wait_type
FROM sys.dm_tran_locks AS l
INNER JOIN sys.dm_exec_requests AS r
ON l.request_session_id = r.session_id
WHERE resource_type = 'METADATA'
      AND
      resource_subtype = 'STATS';

 

 

step5-1.png

Step 6

Run the query from step 4 again. This time, it is blocked, because it needs to acquire the Sch-S lock on the statistics metadata object to compile its plan, but it gets queued behind the background process waiting for the Sch-M lock.

 

 

SELECT COUNT(1) AS cnt
FROM dbo.stats_wlp
WHERE c1 LIKE '%Hello%' 
OPTION (RECOMPILE);

 

 

step6-1.png

 

Cancel the long-running query started in Step 2 to resolve the blocking chain.

 

Now let’s enable the configuration to wait at low priority for asynchronous statistics update, to see how it changes this scenario.

 

 

ALTER DATABASE SCOPED CONFIGURATION SET ASYNC_STATS_UPDATE_WAIT_AT_LOW_PRIORITY = ON;

 

 

Repeat steps 1-5. In step 5, see that for the background process waiting for Sch-M lock to update statistics asynchronously, request_status is now LOW_PRIORITY_CONVERT, and wait_type is LCK_M_SCH_M_LOW_PRIORITY.

step5-2.png

 

In step 6, you can see that the query is no longer blocked and completes quickly, because it is able to acquire the Sch-S lock on the statistics metadata object in front of the background process waiting on a Sch-M lock at low priority. We can continue executing the same query multiple times without any blocking.

 

Cancel the long-running query. The query showing statistics metadata locks now returns an empty set, because as soon as we canceled the long-running query, the background process was able to acquire the Sch-M lock, update statistics successfully, and release the lock.

 

2 Comments
Co-Authors
Version history
Last update:
‎Jun 16 2021 09:07 PM
Updated by: