Troubleshooting data movement latency between synchronous-commit AlwaysOn Availability Groups

Microsoft

Jan 16, 2019

First published on MSDN on Apr 05, 2018
Writer: Simon Su
Technical Reviewer: Pam Lahoud, Sourabh Agarwal, Tejas Shah
Applies to: SQL Server 2014 SP2, SQL Server 2016 SP1, SQL Server 2017 RTM and later

In synchronous-commit mode AG nodes sometimes you may observe your transactions are pending on HADR_SYNC_COMMIT waits. HADR_SYNC_COMMIT waits indicate that SQL server is waiting for the signal from the remote replicas to commit the transaction. To understand the transaction commit latency you can refer to below articles:

Troubleshooting High HADR_SYNC_COMMIT wait type with Always On Availability Groups - Microsoft Tech Community

SQL Server 2012 AlwaysOn – Part 12 – Performance Aspects and Performance Monitoring II - Microsoft Tech Community

In above link you will learn that the transaction delay can be evaluated by below two performance counters:

SQL Server:Database Replica –> Transaction Delay

SQL Server:Database Replica –> Mirrored Write Transactions/sec

For example, assume there are poor performing AG nodes and you see “SQL Server:Database Replica –> Transaction Delay” is 1000 ms (milliseconds), and “SQL Server:Database Replica –> Mirrored Write Transactions/sec” is 50, then it means on average each transaction has a delay of 1000ms/50=20 ms.

Given above example, can we know where the delay of 20 ms is from? What are the factors that cause this latency? To find out the answers of this kind of questions we need to understand how synchronous-commit works:

AlwaysON - HADRON Learning Series: - How does AlwaysON Process a Synchronous Commit Request - Microsoft Tech Community

To track the data movement between replicas we are lucky we have updated xevents:
KB3173156 - Update adds AlwaysOn extended events and performance counters in SQL Server 2014 or 2016 (microsoft.com)

In synchronous-commit mode the basic logic of the log blocks movement is as below:

In primary:

1.1 and 1.2 happen in parallel
-----------------------------
1.1 Log block->LogCache->LogFlush->LocalHarden(LDF)
1.2 Log block->logPool->LogCapture->SendToRemote

In remote synchronous replica:

The harden process is similar as primary:

logBlock receive->LogCache->LogFlush->HardenToLDF->AckPrimary

The xevents (see the first link in this article) happened in different places of log block movement. I have below figure to give you the detailed log movement flow of each steps and those related Xevents:

As showed above when the Xevent trace is captured we can know the precious time point of each step of the log block movement and you can know exactly where the transaction latency is from. Commonly the delay is from three parts:

1. The duration of log harden in primary

It equals to the time delta of Log_flush_start(step 2) and Log_flush_complete (step 3)

2. The duration of log harden in remote replica

It equals to the time delta of Log_flush_start (step 10) and Log_flush_complete (step 11)

3. The duration of network traffic

The sum of time deltas of (primary:hadr_log_block_send_complete ->secondary:hadr_transport_receive_log_block_message, step 6-7) and (secondary:hadr_lsn_send_complete->primary:hadr_receive_harden_lsn_message,step 12-13)

I use below script to capture xevents:

/* Note: this trace could generate very large amount of data very quickly, depends on the actual transaction rate. On a busy server it can grow several GB per minute, so do not run the script too long to avoid the impact to the production server. */

CREATE EVENT SESSION [AlwaysOn_Data_Movement_Tracing] ON SERVER
ADD EVENT sqlserver.file_write_completed,
ADD EVENT sqlserver.file_write_enqueued,
ADD EVENT sqlserver.hadr_apply_log_block,
ADD EVENT sqlserver.hadr_apply_vlfheader,
ADD EVENT sqlserver.hadr_capture_compressed_log_cache,
ADD EVENT sqlserver.hadr_capture_filestream_wait,
ADD EVENT sqlserver.hadr_capture_log_block,
ADD EVENT sqlserver.hadr_capture_vlfheader,
ADD EVENT sqlserver.hadr_db_commit_mgr_harden,
ADD EVENT sqlserver.hadr_db_commit_mgr_harden_still_waiting,
ADD EVENT sqlserver.hadr_db_commit_mgr_update_harden,
ADD EVENT sqlserver.hadr_filestream_processed_block,
ADD EVENT sqlserver.hadr_log_block_compression,
ADD EVENT sqlserver.hadr_log_block_decompression,
ADD EVENT sqlserver.hadr_log_block_group_commit ,
ADD EVENT sqlserver.hadr_log_block_send_complete,
ADD EVENT sqlserver.hadr_lsn_send_complete,
ADD EVENT sqlserver.hadr_receive_harden_lsn_message,
ADD EVENT sqlserver.hadr_send_harden_lsn_message,
ADD EVENT sqlserver.hadr_transport_flow_control_action,
ADD EVENT sqlserver.hadr_transport_receive_log_block_message,
ADD EVENT sqlserver.log_block_pushed_to_logpool,
ADD EVENT sqlserver.log_flush_complete ,
ADD EVENT sqlserver.log_flush_start,
ADD EVENT sqlserver.recovery_unit_harden_log_timestamps
ADD TARGET package0.event_file(SET filename=N'c:\mslog\AlwaysOn_Data_Movement_Tracing.xel',max_file_size=(500),max_rollover_files=(4))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)

For demo purpose I just run “insert into [AdventureWorks2014]..t1 values(1)” and then capture the xevent trace on primary and secondary, and below are the screenshot of the captured xevents:

Primary:

In second synchronous replica:

Note: You may notice that the log_block_id (146028889512) of hadr_receive_harden_lsn_message is not the same as others (146028889488). This is because the return id is always the next immediate id of the harden log block, we can use hadr_db_commit_mgr_update_harden xevent to correlate the xevents:

With above xevent log we now get below detailed time latency breakdowns of the transaction commit:

	From	To	latency
Network: Primary->Second	Primary: hadr_log_block_send_complete 2018-03-06 16:56:28.2174613	Secondary: hadr_transport_receive_log_block_message 2018-03-06 16:56:32.1241242	3.907 seconds
Network: Secondary->Primary	Secondary:hadr_lsn_send_complete 2018-03-06 16:56:32.7863432	Primary:hadr_receive_harden_lsn_message 2018-03-06 16:56:33.3732126	0.587 seconds
LogHarden(Primary)	log_flush_start 2018-03-06 16:56:28.2168580	log_flush_complete 2018-03-06 16:56:28.8785928	0.663 seconds
Log Harden(Secondary)	Log_flush_start 2018-03-06 16:56:32.1242499	Log_flush_complete 2018-03-06 16:56:32.7861231	0.663 seconds

I list the time delta (the latency) just for network and log harden process there, there could be other latency places happening as well, like log block compressing/decompressing etc, but mainly the latency comes from these three parts:

Network latency between replicas. In above example, it is 3.907+0.587=4.494 seconds

Log harden on Primary=0.663 seconds

Log harden on Secondary=0.663 seconds

To get the total transaction delay, we cannot simply sum them up because Log flush on primary and the network transfer are happening in parallel. Say, the network takes 4.494 seconds, but primary log harden finished (log_flush_complete:2018-03-06 16:56:28.8785928) far before the primary gets confirmation from replica (hadr_receive_harden_lsn_message:2018-03-06 16:56:33.3732126). Luckily, we do not need to manually determine which timestamp to use to calculate the total commit time of a transaction. We can use the time delta between the two hadr_log_block_group_commit xevents to know the time to commit. For example, in above log:

Primary: hadr_log_block_group_commit: 2018-03-06 16:56:28.2167393

Primary: hadr_log_block_group_commit: 2018-03-06 16:56:33.3732847

Total time to commit=delta of above two timestamps= 5.157 seconds

This number is equal to the network transfer time plus the log harden time on the secondary. This makes sense because the secondary has to wait for the network for the log block available before log harden, it cannot harden the log in parallel like in the primary.

If you look at the second hadr_log_block_group_commit event it has a column of “processing_time” which is the exactly the commit time of the transaction that we are talking about:

So now you have an overall picture about the log block movement between synchronous-commit mode replicas, and you know where is the latency (if any) from, replicas, network, or disk (log harden) or others.

By the way, you may notice that there is “hadr_db_commit_mgr_harden_still_waiting” xevent happening in primary Xevents. This event happens every 2 seconds (the 2 seconds is hardcoded) when the primary is waiting for the acknowledge message from second replica. If the ack comes back within 2 seconds you won’t see this xevent.

2018.9.8 update
=================
I have developed a tool to analyze the trace automatically:
AGLatency report tool V2.2 introduction | Microsoft Docs

Reference
New in SSMS - Always On Availability Group Latency Reports | Microsoft Docs
AlwaysOn Availability Groups Troubleshooting and Monitoring Guide | Microsoft Docs