Lesson Learned #325: RESOURCE_SEMAPHORE_QUERY_COMPILE in Elastic Database Pool on Server 'XXX'

Microsoft

Feb 09, 2023

Today, I worked on a service request that our customer reported the dominant wait type of their server was RESOURCE_SEMAPHORE_QUERY_COMPILE.

First of all, I would like to add some notes about this type of RESOURCE_SEMAPHORE . It is responsible for granting memory for certain operations, like sort, joins, etc.. Among them, we have RESOURCE_SEMAPHORE_QUERY_COMPILE that is responsible for memory grants that are needed during the compilation process of a query, excluding the memory needed for query execution.

For memory compilation we have different groups/slots depending on how much memory is needed for the compilation (small, medium and big). If the memory compilation is less a specific threshold for the small slot the query does not have to pass through this group/slot.

Other topic is the amount of compilations or query that can move throught the slot at the same time is calculated by the number of logical processors of your Azure SQL DB or Azure SQL Managed Instance.

Altought we could apply for a standalone database, we could see this wait stat for dense elastic database pool or Azure SQL Managed Instance when the queries are not enough parametrized or we have queries that the statistics are updating frecuently, different conections with different SET options, adhoc queries, etc..

The purpose of these group/slots are to maintain under control the compilation memory. The amount of memory for static is fixed but for medium and big are dynamic.

If you want to see the memory assigned for these 3 groups basically, let's run the command DBCC MEMORYSTATUS and you could see them. The configured units will be the maximum number of concurrent compilations allowed and the threshold value will be the amount of memory in bytes.

Small Gateway (internal)                                                 Value
------------------------------------------------------------------------ -----------
Configured Units                                                         8
Available Units                                                          8
Acquires                                                                 0
Waiters                                                                  0
Threshold Factor                                                         380000
Threshold                                                                380000

(6 rows affected)

Medium Gateway (internal)                                                Value
------------------------------------------------------------------------ -----------
Configured Units                                                         2
Available Units                                                          2
Acquires                                                                 0
Waiters                                                                  0
Threshold Factor                                                         12
Threshold                                                                -1

(6 rows affected)

Big Gateway (internal)                                                   Value
------------------------------------------------------------------------ -----------
Configured Units                                                         1
Available Units                                                          1
Acquires                                                                 0
Waiters                                                                  0
Threshold Factor                                                         8
Threshold                                                                -1

In this situation, we found 3 important things that explain this wait stats:

Customer has 450 databases up to 500 in an elastic database pool with a huge AdHoc qureies that will be compiled.
Automatic update statistics enabled.
Also, a memory pressure, that may caused the issue, because, the thresholds of the groups/slots for medium and big will be lowered.

To explain this, once we have a briefly explanation about what does mean, let's try to give an example. For example, I'm going to run in multiple threads the following stored procedure running Adhoc.

CREATE OR ALTER PROC [dbo].[TempTable]
as
declare @id as int
declare  as varchar(20)
declare @sql varchar(max)
set @id=(RAND()*(2000000000-1)+1);
set =convert(varchar(20),@id)
set @sql = '
SELECT 
TOP (' +  + ') *
FROM 
(

SELECT [t3].[Fiscal Month Label] AS [c43],SUM([t6].[Total Excluding Tax])
 AS [a0],SUM([t6].[Total Including Tax])
 AS [a1], COUNT(*) AS [A6],' +  + ' AS [A66]
FROM 
((
select [$Table].[Sale Key] as [Sale Key],
    [$Table].[City Key] as [City Key],
    [$Table].[Customer Key] as [Customer Key],
    [$Table].[Bill To Customer Key] as [Bill To Customer Key],
    [$Table].[Stock Item Key] as [Stock Item Key],
    [$Table].[Invoice Date Key] as [Invoice Date Key],
    [$Table].[Delivery Date Key] as [Delivery Date Key],
    [$Table].[Salesperson Key] as [Salesperson Key],
    [$Table].[WWI Invoice ID] as [WWI Invoice ID],
    [$Table].[Description] as [Description],
    [$Table].[Package] as [Package],
    [$Table].[Quantity] as [Quantity],
    [$Table].[Unit Price] as [Unit Price],
    [$Table].[Tax Rate] as [Tax Rate],
    [$Table].[Total Excluding Tax] as [Total Excluding Tax],
    [$Table].[Tax Amount] as [Tax Amount],
    [$Table].[Profit] as [Profit],
    [$Table].[Total Including Tax] as [Total Including Tax],
    [$Table].[Total Dry Items] as [Total Dry Items],
    [$Table].[Total Chiller Items] as [Total Chiller Items],
    [$Table].[Lineage Key] as [Lineage Key]
from [Fact].[Sale] as [$Table]
) AS [t6]

 INNER JOIN -- Review avoid using Left Join if possible.

(
select [$Table].[Date] as [Date],
    [$Table].[Day Number] as [Day Number],
    [$Table].[Day] as [Day],
    [$Table].[Month] as [Month],
    [$Table].[Short Month] as [Short Month],
    [$Table].[Calendar Month Number] as [Calendar Month Number],
    [$Table].[Calendar Month Label] as [Calendar Month Label],
    [$Table].[Calendar Year] as [Calendar Year],
    [$Table].[Calendar Year Label] as [Calendar Year Label],
    [$Table].[Fiscal Month Number] as [Fiscal Month Number],
    [$Table].[Fiscal Month Label] as [Fiscal Month Label],
    [$Table].[Fiscal Year] as [Fiscal Year],
    [$Table].[Fiscal Year Label] as [Fiscal Year Label],
    [$Table].[ISO Week Number] as [ISO Week Number]
from [Dimension].[Date] as [$Table]
) AS [t3] on 
(
[t6].[Delivery Date Key] = [t3].[Date]
)
)

GROUP BY [t3].[Fiscal Month Label]
)
 AS [MainTable]
WHERE 
(

NOT(
(
[a0] IS NULL 
)
)
 OR 
NOT(
(
[a1] IS NULL 
)
)

)'
 
 EXEC(@SQL)

In this test, we are going to run 35 process at the same time executing the stored procedure around 1000 times each one. We are going to see what is the impact in terms of resources and in terms of this wait stats on this database.

select * from sys.dm_db_resource_stats
select * from sys.dm_os_wait_stats order by waiting_tasks_count desc

SELECT
 substring(REPLACE(REPLACE(SUBSTRING(ST.text, (req.statement_start_offset/2) + 1, (
(CASE statement_end_offset WHEN -1 THEN DATALENGTH(ST.text) ELSE req.statement_end_offset END
- req.statement_start_offset)/2) + 1) , CHAR(10), ' '), CHAR(13), ' '), 1, 512) AS statement_text
,req.database_id
,sess.program_name
,req.session_id
, sysproc.ecid
,sysproc.status
,sysproc.waittime
,sysproc.lastwaittype
, sysproc.cpu
, sysproc.sql_handle
, req.cpu_time 'cpu_time_ms'
, req.status
, wait_time
, wait_resource
, wait_type
, last_wait_type
, req.total_elapsed_time
, total_scheduled_time
, req.row_count as [Row Count]
, command
, scheduler_id
, memory_usage
, req.writes
, req.reads
, req.logical_reads, blocking_session_id

FROM sys.dm_exec_requests AS req
inner join sys.dm_exec_sessions as sess on sess.session_id = req.session_id
inner join sys.sysprocesses as sysproc on sess.session_id = sysproc.spid
CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) as ST
where req.session_id <> @@SPID and sysproc.status <> 'background'
order by sess.session_id, sysproc.ecid

We saw the following details:

As you could see in terms of resources are:

In this situation, we could review several points of the query:

Increase the number of vCores to increase the memory.
Parametrized your query.

Finally, if you want to see the memory used in Azure SQL Database per pool, check these two ways to review the distribution per resource_governor.

select * from sys.dm_resource_governor_resource_pools 
select * from sys.dm_exec_query_resource_semaphores

Enjoy!

Updated Feb 09, 2023

Version 5.0

elastic database query

performance

Resource Usage

Jose_Manuel_Jurado

Microsoft

Joined November 29, 2018

View Profile

Azure Database Support Blog

Follow this blog board to get notified when there's new activity

Blog Post

Lesson Learned #325: RESOURCE_SEMAPHORE_QUERY_COMPILE in Elastic Database Pool on Server 'XXX'