The high availability capabilities of the lagged database copy are enhanced in the upcoming release of Exchange 2016 Cumulative Update 1.
As you may recall, lagged copies can care for themselves by invoking automatic log replay to play down the log files in certain scenarios:
Play down based on health copy status requires ReplayLagManager to be enabled. Beginning with Exchange 2016 CU1, ReplayLagManager is enabled by default. You can change this via the following command:
Set-DatabaseAvailabilityGroup <DAGName> -ReplayLagManagerEnabled $false
When one of the above conditions is triggered, the Replication Service will initiate a play down event for the lagged database copy. However, there are times where this may not be ideal. For example, consider the scenario where there are four database copies on a disk, one passive, one lagged, and two active. Initiating a play down event on the lagged copy has the potential to impact any active copies on that disk – replaying log files generates IO and introduces disk latency as the disk head moves, which impacts users accessing their data on the active copies.
To address this concern, beginning with Cumulative Update 1 for Exchange 2016, the lagged copy’s play down activity is tied to the health of the disk by evaluating the disk’s IO latency:
As a result, deferred lagged copy play down reduces the IO burstiness of lagged copy play down events and ensures that local active copies on the lagged copies disk are not affected. IO sizing of a lagged database copy does not change with this feature (nor does it affect the IO sizing of an active copy); you still must ensure there is available IO headroom in the event that the lagged copy becomes active.
Consider the following example:
The y axis is disk latency, measured in milliseconds. The x axis is a 24-hour period.
As you can see from the graph, between the hours of 1am to 9am, the disk IO latency is below 25ms, meaning that lagged copy replay is allowed. At 10am, the latency exceeds 35ms and this continues until about 2pm; during this time period, lagged copy replay is delayed or deferred. At 2pm, the latency drops below 25ms and lagged copy replay resumes. Latency increases again at 4pm and the process repeats itself.
By default, the maximum amount of time that a play down event can be deferred is 24 hours. You can adjust this via the following command:
Set-MailboxDatabaseCopy <database name\server> -ReplayLagMaxDelay:<value in the format of 00:00:00>
If you want to disable deferred play down, you can set the ReplayLagMaxDelay value to ([TimeSpan]::Zero).
The following events are recorded in the Microsoft-Exchange-HighAvailability/Monitoring crimson channel when log replay is deferred or resumed:
The following events are recorded in the Microsoft-Exchange-HighAvailability/Operational crimson channel when log replay is deferred or resumed:
The changes discussed above continue our work in improving the Preferred Architecture by ensuring that users have the best possible experience on the Exchange platform.
As always, we welcome your feedback.
Ross Smith IV
Principal Program Manager
Office 365 Customer Experience
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.