Blog Post

Azure Database Support Blog
3 MIN READ

Lesson Learned #337: HADR_SYNC_COMMIT wait type delay in Azure SQL Database

Jose_Manuel_Jurado's avatar
Mar 24, 2023

Today, I worked on a service request that our customer faced a performance issue in their business critical service tier database. Their main suspect was that the syncronization with the secondary replica has a delay. 

 

During their troubleshooting process they found that they got many request with the wait type is HADR_SYNC_COMMIT. This wait type indicates the time that the primary replica spends waiting for the secondary replica to harden the log records (LSN). This wait type only occurs on the primary replica and only inside a synchronous replication that we could have for every internal replica when we created a business critical or premium service tier database. 

 

As soon as a transaction is received in the primary replica and it sends to the secondary replica for hardering, the wait type HADR_SYNC_COMMIT wait time starts counting the time until receiving the confirmation for the secondary's transaction log was completed

 

Following, I would like to share with you how is possible to reproduce this and how is posible to know the latency. 

 

I created this table including the option OPTIMIZE_FOR_SEQUENTIAL_KEY=ON for improvements when we are adding incremental values.

 

 

CREATE TABLE [MSxyzTest].[_x_y_z_MS_HighLogIO](
	[DataInsert] [nvarchar](max) NULL,
	[ID] [int] IDENTITY(1,1) NOT NULL,
 CONSTRAINT [PK__x_y_z_MS_HighLogIO] PRIMARY KEY CLUSTERED 
(
	[ID] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

 

 

In order to reproduce a high number of transactions, I developed the following store procedure:

 

 

CREATE OR ALTER Procedure RunATest
as 
	DECLARE @TOP AS INT 
	DECLARE @TSQL AS VARCHAR(4000)
	SET @TOP = CEILING(RAND()*(1-180)+180)
	SET @TSQL = 'INSERT INTO [MSxyzTest].[_x_y_z_MS_HighLogIO] (dataInsert)'
	SET @TSQL = @TSQL + 'values(replicate(CONVERT(NVARCHAR(4000),'
	SET @TSQL = @TSQL + CHAR(39) + 'M' + CHAR(39)+ '),4000))'
    exec (@tsql)

 

 

Finally, using the tool oStress, we are going to run 798 concurrent process (my business critical has 8 vCores and 800 workers limit) the store procedure 10000 times

 

 

"C:\Program Files\Microsoft Corporation\RMLUtils\ostress.exe" -dddbb -Q"EXEC dbo.RunaTest" -Sservername.database.windows.net -Uusername -n798 -r10000 -q -l60 -Ppassword

 

 

Running this process we could see the first HADR_SYNC_COMMIT using the following query:

 

 

select req.wait_resource as Resource,
req.status, tasks.session_id,  task_state, 
wait_type, wait_time, req.last_wait_type, cpu_time, dop,
req.command, 
blocking_session_id,  
substring
(REPLACE
(REPLACE
(SUBSTRING
(ST.text
, (req.statement_start_offset/2) + 1
, (
(CASE statement_end_offset
WHEN -1
THEN DATALENGTH(ST.text)
ELSE req.statement_end_offset
END
- req.statement_start_offset)/2) + 1)
, CHAR(10), ' '), CHAR(13), ' '), 1, 512) AS statement_text,
sched.status,
 * from sys.dm_exec_requests req
join sys.dm_os_workers work on req.task_address = work.task_address 
join sys.dm_os_tasks tasks on req.session_id = tasks.session_id 
join sys.dm_os_schedulers sched on sched.scheduler_id = tasks.scheduler_id
CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) as ST
where req.status <> 'background' and req.session_id<> @@spid
AND req.wait_type LIKE 'HADR_SYNC%'
order by wait_resource,req.session_id, req.status

 

 

For every execution we could see, in the following image, that gave us, as an example, the latency per request.

 

 

Also, using the following query:

 

 

select * from sys.dm_os_wait_stats where wait_type = 'HADR_SYNC_COMMIT'

 

 

 

The Waiting Task Count column give us the number of single transactions done until now, if we divide by wait_time_ms migh give us the milliseconds in average that took for syncronization for secondary. In this case, 676004/372743 = 1.8 milliseconds

 

Finally, our customer asked if could be possible to have information about CPU, IO and workers used. We replied that using the query SELECT * FROM SYS.DM_DB_RESOURCE_STATS you could have the following information:

 

 

Enjoy!

Updated Mar 25, 2023
Version 2.0
  • EitanBlumin - You would resolve it if this were a problem, this is for information only. for instance, database with zone redundancy should have higher value of this wait type as we should ship the log records for another zone which has some more distance compared to secondary hosted in the same cluster. Anyway, it is not an indication of a problem.

    (In extremely rare cases this might indicate network slowness between database replicas which you will notice by seeing that replica is lagging, then anyway you would create a support request)