Optimizing Query Store Memory Usage for Ad hoc Workloads

Microsoft

Jan 28, 2025

Introduction

Query store (QDS) is great at collecting performance tuning telemetry, and to achieve this it uses resources including disk and memory. While the best practices documentation looks at the QDS configuration for adhoc workloads at the general level, one of the key areas for memory usage is the MEMORYCLERK_QUERYDISKSTORE_HASHMAP clerk which can be significant for adhoc workloads, but there are ways to tune this usage.

MEMORYCLERK_QUERYDISKSTORE_HASHMAP

QDS tracks queries and their plans, and to identify these, it creates a hash for each query, (based on the combination of statement_sql_handle, batch_sql_handle, object_id, query_parameterization_type, and context_settings_id), and keeps these hashes in memory using the QDS HASHMAP clerk, to efficiently work with high volumes of queries.

This means that in general, as the number of queries being tracked for a database’s workload increases, the size of this memory clerk goes up. This is exacerbated with multiple databases either on the same SQL instance or elastic pool, since these all share the same QDS hashmap clerk; QDS has memory limits (seen in the readonly_reason in sys.database_query_store_options (Transact-SQL) - SQL Server | Microsoft Learn) both per-database and per instance or elastic pool, but especially in dense elastic pools this can be a significant amount of memory.

While the query store documentation https://learn.microsoft.com/en-us/sql/relational-databases/performance/best-practice-with-the-query-store?view=sql-server-ver16#Parameterize covers the recommendations to avoid adhoc queries and use parameterization, sometimes life isn’t that simple, and you need to optimize this memory usage.

Two groups of queries are tracked;

Undecided queries
Persisted queries

Persisted queries

These are the queries that have met the thresholds, and QDS is recording the execution and waits for them. Control the overall number of distinct persisted queries using the QDS settings:

- Raising the thresholds (execution count, compile CPU and execution CPU) filters out more queries; bear in mind that collecting this telemetry is only useful if you are using it to monitor and tune the workload.

- Reducing the stale query threshold (default is 30 days); however, if the workload has the same set of queries running each day or week, then reducing this may not reduce the total number of distinct queries very much.

To customize these threshold, use the custom capture policy; the auto policy values are 30 executions, 1 seconds of compile CPU and 0.1 second of execution CPU in a day (https://learn.microsoft.com/en-us/sql/t-sql/statements/alter-database-transact-sql-set-options?view=sql-server-ver16#g-enable-the-query-store-with-custom-capture-policy-options ). Note that if you never use the query store telemetry, you can set the capture mode to None, so that no new queries will be persisted.

Undecided queries

To be able to work out whether a query has met the CPU and execution thresholds, QDS calculates the hash, puts this in the hashmap clerk, and tracks the CPU and executions (but not the more detailed runtime stats telemetry). Once a query meets one of the thresholds, it starts being tracked, and its hash stays in the hashmap; if it doesn’t meet any of the thresholds in the threshold period, it is taken out of the hashmap, but is put back in the next time it runs, starting its undecided journey again.

This means that with the AUTO capture policy, every query that has run in the last day has its hash in the hashmap, so for an adhoc workload with relatively few queries reaching the thresholds, these undecided hashes can make up the majority of the hashmap.

To reduce the number of hashes kept, change the STALE_CAPTURE_POLICY_THRESHOLD. The example below shows this. With the default of 24 hours, all of the unique hashes stay in memory, with a final clerk size of 803 MB; changing this to 1 hour, stays at around the 200 MB mark, with QDS evaluating the list of undecided queries every hour, and removing most of these since they haven’t met the thresholds.

Changing STALE_CAPTURE_POLICY_THRESHOLD

This demonstrates the impact of changing this QDS setting; at 1 hour (the first set of results), the hashmap memory fluctuates, while with the default of 24 hours, it steadily increases.

ALTER DATABASE [qds_test] 
SET QUERY_STORE (
OPERATION_MODE = READ_WRITE,
QUERY_CAPTURE_MODE = CUSTOM, 
QUERY_CAPTURE_POLICY = 
(STALE_CAPTURE_POLICY_THRESHOLD = 1 HOURS,
TOTAL_EXECUTION_CPU_TIME_MS = 1000000))

Run details: stale threshold is 1, hour 0, Hashmap MB 131, waiting for 00:36

Run details: stale threshold is 1, hour 1, Hashmap MB 229, waiting for 00:36

Run details: stale threshold is 1, hour 2, Hashmap MB 179, waiting for 00:36

Run details: stale threshold is 1, hour 3, Hashmap MB 276, waiting for 00:36

Run details: stale threshold is 1, hour 4, Hashmap MB 218, waiting for 00:37

Run details: stale threshold is 1, hour 5, Hashmap MB 158, waiting for 00:35

ALTER DATABASE [qds_test] 
SET QUERY_STORE (
OPERATION_MODE = READ_WRITE,
QUERY_CAPTURE_MODE = CUSTOM, 
QUERY_CAPTURE_POLICY = (STALE_CAPTURE_POLICY_THRESHOLD = 24 HOURS,
TOTAL_EXECUTION_CPU_TIME_MS = 1000000))

Run details: stale threshold is 24, hour 0, Hashmap MB 131, waiting for 00:36

Run details: stale threshold is 24, hour 1, Hashmap MB 255, waiting for 00:36

Run details: stale threshold is 24, hour 2, Hashmap MB 405, waiting for 00:34

Run details: stale threshold is 24, hour 3, Hashmap MB 502, waiting for 00:35

Run details: stale threshold is 24, hour 4, Hashmap MB 600, waiting for 00:37

Run details: stale threshold is 24, hour 5, Hashmap MB 803, waiting for 00:38

The code for this example is below; there is an inner loop which runs one hundred thousand adhoc queries once an hour for 6 hours, and this is repeated with the threshold set to 1 and 24 hours. This was run on a single Azure SQL database with the capture CPU threshold set very high so that the only hashmap memory being used is from the undecided queries in this database.

set nocount on

declare @stale_hours INT = 1
declare @hours_lcv INT
declare @tsql nvarchar(300)
declare @starttime datetime
declare @delay_length varchar(8)
declare @hashmap int
declare Max_queries INT = 100000
declare Max_hours INT = 6
declare @lcv INT
-- loop; 1st iteration setting the stale threshold to one hour, the second to 24 hours
while @stale_hours < 25
begin
	-- clear the results from the previous run, and set the stale threshold
	ALTER DATABASE [qds_test] SET QUERY_STORE CLEAR
	set @tsql = 'ALTER DATABASE [qds_test] SET QUERY_STORE (OPERATION_MODE = READ_WRITE,QUERY_CAPTURE_MODE = CUSTOM, QUERY_CAPTURE_POLICY = (STALE_CAPTURE_POLICY_THRESHOLD = ' + cast(@stale_hours AS nvarchar(3)) + ' HOURS,TOTAL_EXECUTION_CPU_TIME_MS = 1000000))'
	select @tsql
	exec sp_executesql @tsql
	set @hours_lcv = 0
	-- repeat the hashmap bloat for a number of hours; wait after the inner loop until the hour is up
	while @hours_lcv < Max_hours
	begin
		set @lcv = 1
		set @starttime = getdate()
		-- generate adhoc queries to bloat the QDS hash map
		while (@lcv < Max_queries)
		begin
			set @tsql = 'select null as nothinghere into #null from sys.databases where 1=2 and database_id = ' + cast((@hours_lcv * Max_queries) + @lcv as nvarchar(10))
			exec sp_executesql @tsql
			set @lcv += 1
		end
		SELECT @hashmap = SUM(pages_kb) / 1024 FROM sys.dm_os_memory_clerks WHERE [type] = 'MEMORYCLERK_QUERYDISKSTORE_HASHMAP'
		set @delay_length = '00:' + right('00' + cast(59 - least(59,datediff(minute,@starttime,getdate())) as varchar(2)),2)
		select 'Run details: stale threshold is ' + cast(@stale_hours AS varchar(2)) + ', hour ' + cast(@hours_lcv as varchar(2)) + ', Hashmap MB ' + cast(@hashmap as varchar(10)) + ', waiting for ' + @delay_length
		-- wait for the rest of the hour
		waitfor delay @delay_length
		set @hours_lcv += 1
	end
	set @stale_hours += 23
end

Conclusion

For workloads where you are using QDS, but the adhoc nature of the workload isn’t able to be addressed, reducing the stale capture threshold can significantly reduce the memory that QDS uses.

Feedback and suggestions

If you have feedback or suggestions for improving this data migration asset, please contact the Databases SQL Ninja Engineering Team (datasqlninja@microsoft.com). Thanks for your support!

Note: For additional information about migrating various source databases to Azure, see the Azure Database Migration Guide.

Updated Jan 29, 2025

Version 2.0

database platform cse

database platform cse – sql

David_Lyth

Microsoft

Joined March 22, 2021

View Profile

Modernization Best Practices and Reusable Assets Blog

Follow this blog board to get notified when there's new activity