I continue to run into this issue over and over in the field so I wanted people to be aware of this possible problem. In a Database Availability Group (DAG), if your databases are randomly mounting or flipping from one server to another, for no apparent reason (including across datacenters) you may be suffering from your network interface card (NIC) going to sleep. And that’s not a good thing.
Power Management on the NIC
In the power management settings for the NIC on Windows Server, make sure you are not allowing the NIC to go into power save mode. Why is this important? It seems like at least once a month I’ve run into customers who have this power management setting turned on and more than one of them even had it turned on for their replication network. They were seeing some odd behavior - for example, their databases randomly flipping from one DAG node to another for no apparent reason. And yes, they were all on physical machines.
Here are the steps to look at this configuration: use Device Manager to change the power management settings for a network adapter.
To disable all Power Management settings in Device Manager, expand Network Adapters, right-click the adapter > Properties > Power Management, and then clear the Allow the computer to turn off this device to save power check box.
Some of your network adapters may not have the Power Management tab available. This is a good thing, as your NIC is not able to go to sleep. This means there is one less item to worry about in your setup!
CAUTION Be careful when you change this setting. If it's enabled and you decide to disable it, you must plan for this modification as it will likely interrupt network traffic. It may seem odd that by just making a seemingly non-impacting change that the NIC will reset itself, but it definitely can. Trust me; I had a customer ‘test’ this during the day by accident… oops!
PowerShell to the rescue
In addition, now that PowerShell is able to be used for just about everything, there is this page that has a PS script available to make this change. There are additional links and related forum threads to review with supplementary information near the bottom of the script download page.
This modifying script will run against all physical adapters to the machines you deploy it to, and you can also modify the script to disable wireless NIC’s. With PS, don’t forget that you can use this script to blast down these changes to all of your Exchange servers with a single step.
GPO and regedit
For those of you that are more comfortable with regedit and creating GPO’s to help control these settings, that option is also available. This page has information on both ‘one off’ fixes that you can download a .reg file and manually deploy, or using GPO Preferences, you can edit the values in a GPO and apply those changes to an Exchange Server OU (Organizational Unit).
The one step to note with the regedit process is which NIC you are working with and how many NIC’s your server has. The registry only knows of the first, second, third, etc. number of NIC’s. Now if you have identical builds between all of your servers, then this option certainly will ensure that all current and any future servers placed into an OU with the GPO applied will adhere to the proper registry settings.
Also don’t forget, you can record all of your changes on a Windows Server 2008R2 or higher OS, by using the Problem Steps Recorder (PSR) tool.
There you have it: if your DAG Databases are randomly becoming active from one server to another with no apparent reason, you may have a sleepy NIC. Please confirm that you have avoided this setting as you build out not only your DAG environment, but all Exchange related servers. Thank you.
Mike O'Neill
You Had Me at EHLO.