Recently I had the opportunity to work on a very interesting case regarding Azure SQL where the duration of a specific query was normally around 1 second, but sometimes this duration increased significantly (between 8 and 15 minutes).
By analyzing the execution plans, we were able to understand that the execution time was always low, but the compile time that greatly increased on the problematic executions:
These greater compile times occurred when the query was executed for the first time in the morning, after a heavy nightly workload being executed on the database. On subsequent executions, compilation times were extremely low.
After a preliminary analysis of the query and the operators on the execution plan, we didn’t find any particular anti-pattern that could clearly justify the experienced behavior.
With that, we started investigating the wait types associated with those problematic executions, and we were able to confirm that WAIT_ON_SYNC_STATISTICS_REFRESH was a dominant wait. This particular wait type occurs when queries wait for a synchronous statistics update to complete, before the compilation and execution can resume.
Accordingly, we were able to confirm that multiple statistics were being automatically updated around these execution times. Information about the last update time for the statistics used in a specific query can be seen on the XML execution plan, under OptimizerStatsUsage > StatisticsInfo. The image below depicts an example of how this information is displayed. Similarly, this information can be queried using DMVs, as it will be posteriorly presented.
The StatisticsInfo also includes the specific name for the used statistics and the respective table (also schema and database) to which they belong, and it includes the modification counter and sampling percent for each statistic.
For a wider view on the statistics update pattern and additional information regarding the underlying tables, we can also use sys.stats and sys.dm_db_stats_properties. On the support case, this allowed us to easily cross the statistics update pattern with the data volume of the respective tables, and confirm that some of these tables had several billion rows – which could expectably contribute for longer statistic update times.
An example of how to use the aforementioned DMVs to query this information for a particular table is shown below.
SELECT
OBJECT_NAME(stat.[object_id]) AS [TableName],
sp.stats_id, name,
sp.last_updated,
rows,
rows_sampled,
steps,
unfiltered_rows,
modification_counter
FROM
sys.stats AS stat
CROSS APPLY
sys.dm_db_stats_properties(stat.[object_id], stat.stats_id) AS sp
WHERE
stat.[object_id] = OBJECT_ID('MySchemaName.MyTableName');
The following image represents the output from the above query using an AdventureWorks test database (table OrderTracking), to illustrate the output:
Based on the above findings, we have decided to enable AUTO_UPDATE_STATISTICS_ASYNC. By default, the asynchronous statistics update option is OFF, and the Query Optimizer updates statistics synchronously.
With synchronous statistics updates, queries always compile and execute with up-to-date statistics. When statistics are out-of-date, the Query Optimizer waits for updated statistics before compiling and executing the query.
Contrarily, with asynchronous statistics updates, queries compile with existing statistics even if the existing statistics are out-of-date. The Query Optimizer could choose a suboptimal query plan if statistics are out-of-date when the query is compiled. Statistics are typically updated soon thereafter.
We have enabled it using the following command:
ALTER DATABASE [MyDatabaseName] SET AUTO_UPDATE_STATISTICS_ASYNC ON
When doing this, it is very important to confirm that Auto Update Statistics and Auto Update Statistics Asynchronously options are both set to True. For example, using the SSMS GUI you can right-click on the respective database and select Properties. The following should be displayed under “Options” Tab:
After enabling Auto Update Statistics Asynchronously, the observed high compilation times ceased to occur.
Note: For more information regarding the benefits and the specific scenarios where the use of AUTO_UPDATE_STATISTICS_ASYNC is recommended, check Statistics - SQL Server | Microsoft Docs.
It’s worth mentioning that, before using AUTO_UPDATE_STATISTICS_ASYNC, there are some specificities that need to be understood.
The first and perhaps most obvious, is that allowing plans to be compiled with out-of-date statistics can lead to suboptimal performance, particularly when those stale statistics are far from reflecting the current data distribution. The resulting execution plan may not be suitable for the actual workload and performance is impacted. With that being said, it’s also important to mention that an execution plan built upon outdated statistics may be still good enough – as it was confirmed on our support case. But this is necessarily an empiric process, that will require monitoring and attention.
The other aspect is regarding concurrency. The asynchronous statistics update is performed by a background request that, when the request is ready to write updated statistics to the database, attempts to acquire a schema modification lock on the statistics metadata object. If a different session is already holding a lock on the same object, asynchronous statistics update is blocked until the schema modification lock can be acquired. Similarly, sessions that need to acquire a schema stability (Sch-S) lock on the statistics metadata object to compile a query may be blocked by the asynchronous statistics update background session, which is already holding or waiting to acquire the schema modification lock. Therefore, for workloads with very frequent query compilations and frequent statistics updates, using asynchronous statistics may increase the likelihood of blocking. In such cases, we can leverage the use of ASYNC_STATS_UPDATE_WAIT_AT_LOW_PRIORITY database-scoped configuration, which enforces that this request won’t acquire the Sch-M lock on the statistics metadata object until all other processes holding incompatible locks on the same object release their locks. A detailed explanation of this scenario can be found in this great article - Improving concurrency of asynchronous statistics update - Microsoft Tech Community.
Since AUTO_UPDATE_STATISTICS_ASYNC may not be the best fit for all case scenarios, alternative approaches can be considered.
For example, manually updating statistics more frequently, to avoid triggering the statistics auto-update. More information about how to maintain Azure SQL Indexes and Statistics can be found in this article: How to maintain Azure SQL Indexes and Statistics - Microsoft Tech Community.
Disabling auto update statistics would also have reduced the high durations that were being observed, since it would avoid the update to begin with. Auto update can be disabled at different levels and using different commands, as listed below:
However, it is important to stress that for most workloads it’s advisable to allow the auto update statistics to occur. Therefore, this should be considered as a last resort and, when testing this approach, it’s recommended to start at the index / query level to have greater flexibility while minimizing the potential drawbacks. Moreover, in such cases it is crucial to have an index / statistics maintenance strategy in place (as mentioned in How to maintain Azure SQL Indexes and Statistics - Microsoft Tech Community) to ensure that statistics are being properly maintained.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.