It’s beneficial to understand how Query Store captures query runtime stats when they are used for performance troubleshooting. According to public documentation: upon the query execution, runtime statistics are sent to Query Store.
I am curious whether Query Store will still capture execution statistics under certain scenarios. Hence, in this blog post, I will be testing the two following scenarios with my Azure SQL Database:
- Whether Query Store captures runtime stats for query that has completed within an uncommitted transaction, then the transaction rolls back due to a scaling process.
- Whether Query Store captures runtime stats for query that is blocked and does not finish before a disconnect caused by a disruptive event (e.g., a service tier scale-up).
Scenarios Testing
Prerequisites
- Ensure the Query Store capture mode is set to All. (To ensure it captures all the queries it should capture.)
Scenario #1 - Query that has completed within an uncommitted transaction, then the transaction rolls back due to a scaling process.
- Run a query with an uncommitted transaction
BEGIN TRANSACTION
SELECT TOP (1000) [id],[col1],[col2]
FROM [dbo].[Table1]
WAITFOR DELAY '02:00'
- Check the Runtime stats for the query after it completes, it's found in the Query Store.
SELECT q.query_id
,q.query_text_id
,OBJECT_NAME(q.object_id) AS OBJECT_NAME
,q.query_hash
,qp.plan_id
,qp.query_plan_hash
,qt.query_sql_text
,convert(xml,qp.query_plan) as plan_xml
,qrs.last_execution_time
,qrs.avg_cpu_time
FROM sys.query_store_plan qp
INNER JOIN sys.query_store_query q ON qp.query_id = q.query_id
INNER JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_runtime_stats qrs ON qrs.plan_id = qp.plan_id
WHERE query_hash in ( 0xCEE3DC068B8439D0 )
- Scale up the service tier.
- After scaling completes, I check the Runtime stats for the query again, the runtime stats for the execution just now are lost.
I think it's because the runtime stats is still cached and has not been persisted to the disk before I scale up the service tier. Therefore, when the node changes(scaling), whatever runtime stats that have not been persisted to the disk, will be lost. (A public document actually mentions about this)
- Test again, and this time, manually flush the data to query store, or wait a bit longer for auto flush to happen before scaling up the database.
EXEC sp_query_store_flush_db
This time, the runtime stats are persisted and not lost after the scaling.
Scenario #2 - Query that is blocked or does not finish before a disconnect caused by a disruptive event (e.g., a service tier scale-up).
First, I test the blocking scenario to verify if Query Store does not capture the runtime stats for the query because it does not complete due to blocking
- Get the current query runtime stats for the candidate query (0x37F2DD8F0C84C585) before the testing
SELECT q.query_id
,q.query_text_id
,OBJECT_NAME(q.object_id) AS OBJECT_NAME
,q.query_hash
,qp.plan_id
,qp.query_plan_hash
,qt.query_sql_text
,convert(xml,qp.query_plan) as plan_xml
,qrs.last_execution_time
,qrs.avg_cpu_time
,qrs.last_duration
,qrs.last_cpu_time
,qrs.count_executions
,qrs.execution_type_desc
FROM sys.query_store_plan qp
INNER JOIN sys.query_store_query q ON qp.query_id = q.query_id
INNER JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_runtime_stats qrs ON qrs.plan_id = qp.plan_id
WHERE query_hash in ( 0x37F2DD8F0C84C585 )
order by qrs.last_execution_time asc
- Run below query to hold the table lock
BEGIN TRANSACTION
SELECT TOP (1000) [id],[col1],[col2]
FROM [dbo].[Table1] WITH (TABLOCKX)
WAITFOR DELAY '05:00'
- Run the candidate query (0x37F2DD8F0C84C585) again and you can see it being blocked.
INSERT INTO Table1 values ('A', 'B')
- Check the runtime stats for the candidate query (0x37F2DD8F0C84C585) again. No new runtime stats are generated or recorded.
- Even after manually flushing the query store data, it shows the same. (I think the runtime stats have not been generated or collected because the query does not complete.)
- Expectedly, after the scaling, the Query Store runtime stats for the candidate query show the same.
Second, I test the scenario where the query does not complete due to its long running (no blocking).
- Run an insert query that takes minutes to complete
INSERT INTO [dbo].[Table3] ([id], [col1], [col2])
SELECT [id], [col1], [col2]
FROM [dbo].[Table1];
- First, test to ensure that Query Store does capture this query when it completes - as you can see, the runtime stats can be recorded.
SELECT q.query_id
,q.query_text_id
,OBJECT_NAME(q.object_id) AS OBJECT_NAME
,q.query_hash
,qp.plan_id
,qp.query_plan_hash
,qt.query_sql_text
,convert(xml,qp.query_plan) as plan_xml
,qrs.last_execution_time
,qrs.avg_cpu_time
,qrs.last_logical_io_reads
,qrs.avg_physical_io_reads
,qrs.last_rowcount
FROM sys.query_store_plan qp
INNER JOIN sys.query_store_query q ON qp.query_id = q.query_id
INNER JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_runtime_stats qrs ON qrs.plan_id = qp.plan_id
WHERE query_hash in ( 0x465560FFECC0DF19 )
order by qrs.last_execution_time
- Now, delete the table and run the same insert query again.
- Check the Query Store runtime stats for the query while it is still running. Runtime Stats are not generated.
- Scale up the database before the insert query completes. (i.e., The insert query gets disrupted)
- Check Runtime stats again after the scaling - runtime stats are not generated.
- And the Table is empty. (No data have been inserted because the insert query did not complete)
Conclusions
- Once a query has completed, its runtime stats will be generated and collected in the Query Store cache. These stats will then be flushed to disk after a period of time, regardless of whether the transaction is uncommitted or rolled back. The
flush_interval_seconds
parameter specifies the flush interval. - The runtime stats for a query are only generated and collected once the query completes. This means that if the SQL database is shut down or restarted before the query’s runtime stats have been generated or collected, the stats will be lost.