Exchange 2019 Failover with DAG Question

%3CLINGO-SUB%20id%3D%22lingo-sub-987102%22%20slang%3D%22en-US%22%3EExchange%202019%20Failover%20with%20DAG%20Question%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-987102%22%20slang%3D%22en-US%22%3E%3CP%3EI%20am%20trying%20to%20get%20an%20understanding%20of%20what%20happened%20to%20my%20organization%20a%20couple%20of%20weeks%20ago%20and%20am%20hoping%20that%20someone%20will%20help%20me%20clarify%20how%20Exchange%202019%20failover%20works%20with%20a%20DAG.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThis%20past%20summer%20my%20organization%20upgraded%20to%20Exchange%202019%2C%20a%20DAG%20was%20added%20and%20we%20have%202%20servers%20that%20host%204%20databases%20that%20my%20understanding%20is%20with%20the%20DAG%20that%20messages%20synchronize%20between%20the%20databases%20on%20both%20servers.%26nbsp%3B%20There%20is%20a%20load%20balancer%20in%20front%20of%20the%20servers%20as%20well%20so%20inbound%20connections%20will%20go%20to%20both%20servers.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ELast%20month%20the%20servers%20were%20scheduled%20for%20updates.%26nbsp%3B%20Our%20current%20configuration%20is%20that%20all%20databases%20are%20active%20on%20one%20server%20and%20passive%20on%20the%20second%20one.%26nbsp%3B%20We%20were%20able%20to%20do%20updates%20to%20the%20passive%20database%20server%20and%20reboot%20it%20without%20any%20incident.%26nbsp%3B%20Since%20it%20was%20the%20first%20time%20doing%20updates%2C%20I%20forgot%20to%20change%20the%20active%20database%20over%20to%20the%20server%20that%20was%20just%20updated%2C%20so%20when%20the%20server%20hosting%20the%20active%20database%20rebooted%20our%20email%20system%20was%20inaccessible%20until%20the%20server%20completed%20its%20updating%20and%20reboot%20cycle%20which%20ended%20up%20being%20about%20an%20hour.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThat%20is%20where%20I%20am%20looking%20for%20help.%26nbsp%3B%20It%20is%20my%20understanding%20that%20the%20DAG%20should%20have%20recognized%20that%20the%20active%20database%20server%20was%20down%20and%20should%20have%20activated%20the%20databases%20on%20the%20server%20that%20already%20had%20updates%20completed...right%3F%26nbsp%3B%20I%20was%20curious%20if%20there%20may%20have%20been%20a%20validation%20time%20or%20something%20happen%2C%20so%20I%20am%20currently%20in%20for%20overnight%20testing%20and%20manually%20flipped%20my%20active%20databases%20between%20servers%20and%20did%20not%20experience%20any%20issues%2C%20delays%2C%20hiccups%20or%20anything%20to%20indicate%20that%20there%20was%20a%20change.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ESo%20that%20brings%20me%20to%20trying%20to%20find%20out%2C%20what%20was%20it%20about%20rebooting%20the%20server%20that%20had%20the%20active%20databases%20on%20it%20that%20halted%20access%20to%20our%20entire%20email%20system%20until%20that%20server%20came%20back%20up%3F%26nbsp%3B%20This%20behavior%20makes%20me%20nervous%20for%20the%20event%20of%20a%20real%20failure%20and%20the%20ability%20for%20email%20to%20resume%20availability.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIs%20there%20anything%20I%20can%20do%20to%20better%20handle%20this%20scenario%20in%20the%20future%20or%20in%20the%20event%20of%20a%20real%20failure%3F%26nbsp%3B%20I%20now%20know%20that%20the%20next%20time%20we%20do%20maintenance%20that%20I%20will%20change%20the%20active%20database%20between%20servers%20before%20rebooting%2C%20but%20while%20we%20had%20our%20previous%20outage%20I%20couldn't%20even%20access%20the%20ECP%20URL.%26nbsp%3B%20I%20did%20find%20the%20Powershell%20commands%20for%20performing%20a%20switchover%2C%20if%20we%20experienced%20a%20failure%20of%20the%20server%20with%20active%20databases%2C%20would%20I%20be%20able%20to%20run%20the%20command%20from%20powershell%20and%20resume%20email%20operation%20from%20the%20second%20server%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ELast%2C%20since%20our%20current%20configuration%20is%20that%20all%20databases%20are%20active%20on%20the%20same%20server%2C%20would%20it%20make%20more%20sense%20to%20make%202%20active%20on%20each%20server%3F%26nbsp%3B%20My%20idea%20being%20that%20if%20either%20server%20experienced%20this%20problem%20again%20that%20it%20would%20only%20be%20half%20of%20the%20post%20office%20that%20would%20be%20down%20rather%20than%20the%20whole%20thing.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ESorry%20for%20the%20wall%20of%20text%20but%20I'm%20getting%20pressure%20to%20come%20up%20with%20an%20answer%20and%20haven't%20been%20able%20to%20find%20one.%26nbsp%3B%20Thanks!%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-987102%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EExchange%20Server%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EHybrid%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-988008%22%20slang%3D%22en-US%22%3ERe%3A%20Exchange%202019%20Failover%20with%20DAG%20Question%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-988008%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F447086%22%20target%3D%22_blank%22%3E%40mmazurkiewcz269%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%3CP%3EDo%20you%20have%20a%20File%20Share%20Witness%3F%3C%2FP%3E%3CP%3EOne%20is%20required%20if%20there%20is%20an%20even%20number%20of%20servers%20in%20the%20DAG.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-990272%22%20slang%3D%22en-US%22%3ERe%3A%20Exchange%202019%20Failover%20with%20DAG%20Question%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-990272%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F181666%22%20target%3D%22_blank%22%3E%40Neill%20Tinlin%3C%2FA%3E%26nbsp%3BYes%20we%20have%20a%20witness%20server%2C%20its%20on%20the%20same%20system%20as%20the%20non-Exchange%20server%20for%20the%20DAG%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-991485%22%20slang%3D%22en-US%22%3ERe%3A%20Exchange%202019%20Failover%20with%20DAG%20Question%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-991485%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F447086%22%20target%3D%22_blank%22%3E%40mmazurkiewcz269%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%3CP%3ESo%20the%20FSW%20is%20on%20AN%20Other%20server%20which%20isn't%20an%20Exchange%20server%2C%20isn't%20a%20DC%20and%20is%20in%20the%20same%20AD%20domain%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E1.%20Check%20that%20the%20domain%20Exchange%20Trusted%20Subsystem%20group%20is%20actually%26nbsp%3B%20in%20the%20local%20admins%20group%20of%20the%20FSW.%20This%20catches%20a%20lot%20of%20people%20out.%20e.g.%20It%20might%20be%20set%20via%20group%20policy%20or%20some%20sort%20of%20security%20scanning%20util%20might%20take%20it%20out%20as%20not%20being%20an%20approved%20group.%3C%2FP%3E%3CP%3E2.%20Run%20Get-DatabaseAvailabilityGroup%20command%20and%20make%20a%20note%20of%20the%20FSW%20directory%2C%20on%20the%20FSW%20check%20if%20this%20folder%20exists%20and%20that%20there%20are%20files%20in%20it%3C%2FP%3E%3CP%3E3.%20Check%20that%20there%20are%20no%20firewalls%20between%20any%20of%20the%203%20servers%2C%20especially%20on%20port%20445%3C%2FP%3E%3CP%3E4.%20If%20for%20some%20reason%20you%20are%20using%20Windows%20firewall%20on%20the%20FSW%20check%20that%20it%20is%20using%20the%20domain%20profile.%20I've%20seen%20cases%20where%20Network%20Location%20Awareness%20sets%20the%20wrong%20profile%20and%20thus%20blocks%20traffic.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ENT%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-991890%22%20slang%3D%22en-US%22%3ERe%3A%20Exchange%202019%20Failover%20with%20DAG%20Question%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-991890%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F181666%22%20target%3D%22_blank%22%3E%40Neill%20Tinlin%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThe%20FSW%20is%20its%20own%20dedicated%20server%2C%20is%20not%20one%20of%20the%20Exchange%20servers%20and%20isn't%20a%20DC%2C%20and%20is%20on%20the%20domain.%3C%2FP%3E%3CP%3EFollowing%20through%20the%20checks%20that%20you%20provided%3A%3C%2FP%3E%3CP%3E1.%26nbsp%3B%20It%20does%20have%20the%20Exchange%20Trusted%20Subsystem%20in%20the%20local%20admins%20group%3C%2FP%3E%3CP%3E2.%26nbsp%3B%20I%20had%20to%20use%20the%20web%20interface%20to%20get%20the%20file%20path%20from%20the%20FSW.%26nbsp%3B%20I%20found%20that%20only%20one%20of%20the%20Exchange%20servers%20have%20the%20path%20to%20the%20Witness%20directory%20physically%20on%20the%20system.%3C%2FP%3E%3CP%3E3%2F4.%20Firewall%20and%20Network%20settings%20are%20good%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ESo%20based%20on%20this%20I%20think%20that%20my%20DAG%20isn't%20configured%20quite%20right.%26nbsp%3B%20From%20the%20web%20interface%20I%20can%20open%20the%20DAG%20settings%20and%20do%20see%20the%20Witness%20server%20in%20the%20correct%20field%20with%20the%20directory%20path.%26nbsp%3B%20Again%2C%20the%20directory%20is%20only%20present%20on%20one%20of%20the%20Exchange%20servers%20and%20not%20the%20FSW%20server.%26nbsp%3B%20The%20DAG%20members%20does%20show%20the%202%20Exchange%20servers%20listed%20as%20well.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EWhere%20do%20I%20go%20from%20here%3F%26nbsp%3B%20I%20apologize%20for%20the%20questions%2C%20this%20was%20originally%20set%20up%20by%20a%20consultant%20months%20ago%20and%20the%20actual%20person%20is%20on%20another%20job%20so%20has%20limited%20availability%20to%20follow%20up%20with.%3C%2FP%3E%3C%2FLINGO-BODY%3E
Highlighted
New Contributor

I am trying to get an understanding of what happened to my organization a couple of weeks ago and am hoping that someone will help me clarify how Exchange 2019 failover works with a DAG.

 

This past summer my organization upgraded to Exchange 2019, a DAG was added and we have 2 servers that host 4 databases that my understanding is with the DAG that messages synchronize between the databases on both servers.  There is a load balancer in front of the servers as well so inbound connections will go to both servers.

 

Last month the servers were scheduled for updates.  Our current configuration is that all databases are active on one server and passive on the second one.  We were able to do updates to the passive database server and reboot it without any incident.  Since it was the first time doing updates, I forgot to change the active database over to the server that was just updated, so when the server hosting the active database rebooted our email system was inaccessible until the server completed its updating and reboot cycle which ended up being about an hour.

 

That is where I am looking for help.  It is my understanding that the DAG should have recognized that the active database server was down and should have activated the databases on the server that already had updates completed...right?  I was curious if there may have been a validation time or something happen, so I am currently in for overnight testing and manually flipped my active databases between servers and did not experience any issues, delays, hiccups or anything to indicate that there was a change.

 

So that brings me to trying to find out, what was it about rebooting the server that had the active databases on it that halted access to our entire email system until that server came back up?  This behavior makes me nervous for the event of a real failure and the ability for email to resume availability.

 

Is there anything I can do to better handle this scenario in the future or in the event of a real failure?  I now know that the next time we do maintenance that I will change the active database between servers before rebooting, but while we had our previous outage I couldn't even access the ECP URL.  I did find the Powershell commands for performing a switchover, if we experienced a failure of the server with active databases, would I be able to run the command from powershell and resume email operation from the second server?

 

Last, since our current configuration is that all databases are active on the same server, would it make more sense to make 2 active on each server?  My idea being that if either server experienced this problem again that it would only be half of the post office that would be down rather than the whole thing.

 

Sorry for the wall of text but I'm getting pressure to come up with an answer and haven't been able to find one.  Thanks!

4 Replies
Highlighted

@mmazurkiewcz269 

Do you have a File Share Witness?

One is required if there is an even number of servers in the DAG.

Highlighted

@Neill Tinlin Yes we have a witness server, its on the same system as the non-Exchange server for the DAG

Highlighted

@mmazurkiewcz269 

So the FSW is on AN Other server which isn't an Exchange server, isn't a DC and is in the same AD domain?

 

1. Check that the domain Exchange Trusted Subsystem group is actually  in the local admins group of the FSW. This catches a lot of people out. e.g. It might be set via group policy or some sort of security scanning util might take it out as not being an approved group.

2. Run Get-DatabaseAvailabilityGroup command and make a note of the FSW directory, on the FSW check if this folder exists and that there are files in it

3. Check that there are no firewalls between any of the 3 servers, especially on port 445

4. If for some reason you are using Windows firewall on the FSW check that it is using the domain profile. I've seen cases where Network Location Awareness sets the wrong profile and thus blocks traffic.

 

NT

Highlighted

@Neill Tinlin 

The FSW is its own dedicated server, is not one of the Exchange servers and isn't a DC, and is on the domain.

Following through the checks that you provided:

1.  It does have the Exchange Trusted Subsystem in the local admins group

2.  I had to use the web interface to get the file path from the FSW.  I found that only one of the Exchange servers have the path to the Witness directory physically on the system.

3/4. Firewall and Network settings are good

 

So based on this I think that my DAG isn't configured quite right.  From the web interface I can open the DAG settings and do see the Witness server in the correct field with the directory path.  Again, the directory is only present on one of the Exchange servers and not the FSW server.  The DAG members does show the 2 Exchange servers listed as well.

 

Where do I go from here?  I apologize for the questions, this was originally set up by a consultant months ago and the actual person is on another job so has limited availability to follow up with.