When one of the above conditions is triggered, the Replication Service will initiate a play down event for the lagged database copy. However, there are times where this may not be ideal. For example, consider the scenario where there are four database copies on a disk, one passive, one lagged, and two active. Initiating a play down event on the lagged copy has the potential to impact any active copies on that disk – replaying log files generates IO and introduces disk latency as the disk head moves, which impacts users accessing their data on the active copies.
To address this concern, beginning with Cumulative Update 1 for Exchange 2016, the lagged copy’s play down activity is tied to the health of the disk by evaluating the disk’s IO latency:
If the disk’s read IO latency is above 35ms, the play down event is deferred. In the event that there is a disk capacity concern, the disk latency deferral will be ignored and the lagged copy will play down.
Once the disk’s read IO latency drops below 25ms, the play down event is resumed.
As a result, deferred lagged copy play down reduces the IO burstiness of lagged copy play down events and ensures that local active copies on the lagged copies disk are not affected. IO sizing of a lagged database copy does not change with this feature (nor does it affect the IO sizing of an active copy); you still must ensure there is available IO headroom in the event that the lagged copy becomes active.
Consider the following example:
The y axis is disk latency, measured in milliseconds. The x axis is a 24-hour period.
As you can see from the graph, between the hours of 1am to 9am, the disk IO latency is below 25ms, meaning that lagged copy replay is allowed. At 10am, the latency exceeds 35ms and this continues until about 2pm; during this time period, lagged copy replay is delayed or deferred. At 2pm, the latency drops below 25ms and lagged copy replay resumes. Latency increases again at 4pm and the process repeats itself.
By default, the maximum amount of time that a play down event can be deferred is 24 hours. You can adjust this via the following command:
Set-MailboxDatabaseCopy <database name\server> -ReplayLagMaxDelay:<value in the format of 00:00:00>
If you want to disable deferred play down, you can set the ReplayLagMaxDelay value to ([TimeSpan]::Zero).
The following events are recorded in the Microsoft-Exchange-HighAvailability/Monitoring crimson channel when log replay is deferred or resumed:
Event 750 – Replay Lag Manager requested activating replay lag delay (suspending log replay) for database copy '%1\%2' after a suppression interval of %4. Delay Reason: %6"
Event 751 – Replay Lag Manager successfully activated replay lag delay (suspended log replay) for database copy '%1\%2'. Delay Reason: %4"
Event 752 – Replay Lag Manager failed to activate replay lag delay (suspend log replay) for database copy '%1\%2'. Error: %4"
Event 753 – Replay Lag Manager requested deactivating replay lag (resuming log replay) for database copy '%1\%2' after a suppression interval of %4. Reason: %5"
Event 754 – Replay Lag Manager successfully deactivated replay lag (resumed log replay) for database copy '%1\%2'. Reason: %4
Event 755 - Replay Lag Manager failed to deactivate replay lag (resume log replay) for database copy '%1\%2'. Error: %4
Event 756 - Replay Lag Manager will attempt to deactivate replay lag (resume log replay) for database copy '%1\%2' because it has reached the maximum allowed lag duration. Detailed Reason: %5
The following events are recorded in the Microsoft-Exchange-HighAvailability/Operational crimson channel when log replay is deferred or resumed:
Event 748 – Log Replay suspend/resume state for database '%1' has changed. (LastSuspendReason=%3, CurrentSuspendReason=%4, CurrentSuspendReasonMessage=%5)