In a previous blog post Ross Smith IV had explained what the Replay Lag Manager is and what it does. It's a great feature that's somewhat underappreciated. We've seen a few support cases that seemed to have been opened out of the misunderstanding of what the Replay Lag Manager is doing. I wanted to cover a real world scenario I had dealt with recently with a customer that I believe will clarify some things.
What is a Replay Lag Manager?
In a nutshell, Replay Lag Manager provides higher availability for Exchange through the automatic invocation of a lagged database copy. To further explain, a lagged database copy is a database that Exchange delays committing changes to for a specified period of time. The Replay Lag Manager was first introduced in Exchange 2013 and is actually enabled by default beginning with Exchange 2016 CU1. To understand what it is let's look at the Preferred Architecture (PA) in regards to a database layout. The PA uses 4 database copies like the following:
- When a low disk space threshold (10,000MB) is reached
- When the lagged DB copy has physical corruption and needs to be page patched
- When there are fewer than three available healthy HA copies for more than 24 hours
When things are less than perfect…
In the real world we don't always see Exchange setup according to our Preferred Architecture because of environment constraints or business requirements. There was a recent case that was the best example of Lag Replay Manager working in the real world. The customer had over 100 DB's, all with 6 copies each. There were 3 copies in the main site and 3 copies in the Disaster Recovery site with one of those copies at each site being lagged. The DB copies were configured like this for all databases.![clip_image002[5] clip_image002[5]](https://techcommunity.microsoft.com/legacyfs/online/media/2018/01/clip_image0025_thumb.jpg)
Get-MailboxDatabaseCopyStatus DB1\SERVER3 | Select Id,ReplayLagStatus
Id : DB1\SERVER3
ReplayLagStatus : Enabled:False; PlayDownReason:LagDisabled; ReplaySuspendReason:None;
Percentage:0; Configured:7.00:00:00; MaxDelay:1.00:00:00; Actual:00:01:22
Time: 11/31/2017 3:32:55 PM
ID: 708
Level: Information
Source: Microsoft-Exchange-HighAvailability
Machine: server3.domain.com
Message: Log Replay for database 'DB1' is replaying logs in the replay lag range. Reason: Replay lag has been disabled. (LogFileAge=00:06:00.8929066, ReasonCode=LagDisabled)
Time: 11/31/2017 3:32:55 PM
ID: 2001
Level: Warning
Source: Microsoft-Exchange-HighAvailability
Machine: server3.domain.com
Message: Database scanning during passive replay is disabled on 'DB1'. Explanation: FastLagPlaydownDesired.
Time: 11/30/2017 1:50:15 PM
ID: 738
Level: Information
Source: Microsoft-Exchange-HighAvailability
Machine: server3.domain.com
Message: Replay Lag Manager suppressed a request to disable replay lag for database copy 'DB1\SERVER3' after a suppression interval of 1.00:00:00. Disable Reason: There were database availability check failures for database 'DB1' that may be lowering its availability. Availability Count: 3. Expected Availability Count: 3. Detailed error(s):
SERVER4:
Server 'server4.domain.com' has database copy auto activation policy configuration of 'Blocked'.
SERVER5:
Server 'server5.domain.com' has database copy auto activation policy configuration of 'Blocked'.
SERVER6:
Server 'server6.domain.com' has database copy auto activation policy configuration of 'Blocked'.
It's Replay Lag Manager doing it… but why?
The entire reason for this blog post comes out of the fact that we've seen the Replay Lag Manager blamed for not letting a lagged copy lag. So, the next step someone will do is to disable it. Please don't do that! It only wants to help! Let's look at how we can resolve the our above example. The logs are showing that it's expecting 3 copies but there aren't 3 available. How can that be? They have at least 4 copies of this database available?!? If we run the following command we see a hint at culprit.Get-mailboxdatabasecopystatus DB1 | Select Identity,AutoActivationPolicy
Identity AutoActivationPolicy
-------- --------------------
DB1\SERVER1 Unrestricted
DB1\SERVER2 Unrestricted
DB1\SERVER3 Unrestricted - Lagged Copy (Not lagging)
DB1\SERVER4 Blocked
DB1\SERVER5 Blocked
DB1\SERVER6 Blocked - Lagged Copy (Working)
![clip_image002[8] clip_image002[8]](https://techcommunity.microsoft.com/legacyfs/online/media/2018/01/clip_image0028_thumb.jpg)
That's cool… how do I fix it?
There are essentially two ways to resolve this example and allow that lagged copy at Site A to properly lag. The first way is to revisit the decision to block Auto Activation at Site B. The mindset in this particular instance was that their other site was actually for Disaster Recovery. They wanted some manual intervention if databases needed to fail over to the DR site. That's all well and good but it doesn't allow for a lagged copy at Site A to work properly due to the Replay Lag Manager. The customer did actually end up allowing 1 copy at the DR site (site B in our example) for Auto Activation. To do this you can run the following command:Set-Mailboxdatabasecopy SERVER4\DB1 -DatabaseCopyAutoActivationPolicy Unrestricted
The other option here would be to create another database copy at Site A. Obviously, that's going to require a lot more effort and storage. However, doing this would allow for the Replay Lag Manager to resume lagging on the lagged database copy. I hope this post clarifies some things in regards to the Replay Lag Manager. It's a great feature that will provide some automation in keeping your Exchange databases highly available. Michael Schatte