Failing back from DR site after primary site is back online

mssql-support · ‎Jan 15 2019

First published on MSDN on Apr 27, 2015

Assume steps similar to those from Manual Failover of Availability Group were used to move to the DR site.

When the Availability Group is brought online on the DR site with alter availability group…force_failover_allow_data_loss, the replication from the new primary at the DR site to the secondary(s) at the primary site will be suspended, and can only be started manually. Once the primary site is back up and the connection between the two sites is stable, then bring the servers hosting the SQL Server service up and start the SQL Server Services. Check that the cluster service is running on all nodes at the primary site and there are no error messages in the system event log for the cluster service.

After verifying the cluster is stable, change the voting back to the original setup, if it was changed while bringing the the cluster up on the DR site.

Add voting back to the nodes at primary site.

When you failed the primary replica to the DR site, you may have also adjusted the node weights of those nodes hosting availability group replicas. It is time to set the node weight back, removing the node weight from the DR site and adding it back to the nodes a the primary data center.

(Get-ClusterNode –Name "NodeName").NodeWeight=1

Remove voting from node at DR site.

(Get-ClusterNode –Name "NodeName").NodeWeight=0

Verify the current voting in the cluster.

Get-ClusterNode | fl Name,NodeWeight

Synchronize Original Secondary at primary site

Change the synchronization mode to asynchronous and start synchronization with the SQL Server instance that was the secondary at the primary site for each database in the availability group.

Alter Availability Group <agname> Modify Replica On <SQLInstance_at_DR_Site> with AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT

Alter database <dbname> SET HADR RESUME

After resuming the database synchronization, the status can be monitored using the availability group dashboard , adding the columns log send queue size (KB) and log send rate(KB/Sec) to the availability group column. If the secondary at the primary site was ahead of the secondary at the DR site, then the database will first have to go through the reverting process.

Once the databases for the availability group are synchronized, then you can change the synchronization to synchronous, if it is not already set, and failover from the DR site to the primary site.

Alter Availability Group <agname> Modify Replica On <SQLInstance_at_DR_Site> with AVAILABILITY_MODE = SYNCHRONOUS_COMMIT

Alter Availability Group <agname> failover

After the failover to the secondary on the primary site, the synchronization can be changed to asynchronous.

Alter Availability Group <agname> Modify Replica On <SQLInstance_at_DR_Site> with AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT

At this point, the original primary is still at the point of time of the failover to the DR site and the secondary at the primary site is the new primary. If there was data loss that occurred during the failover to the DR site. A utility like tablediff or third party utility can be used to identify data from the original primary that is not in the current primary.

Steps to put original primary in read only mode, if data recovery is needed

For SQL Server 2014:

The database should be in a read only state while it is in a suspended state.

After recovering any data from the original primary, then it can begin synchronization.

Alter database <dbname> SET HADR RESUME

For SQL Server 2012:

The database will need to be removed from the availability group on the original primary and restored with recovery.

Alter database <dbname> SET HADR AVAILABILITY GROUP OFF

restore database <dbname> with recovery

Synchronize Original Primary

Use the same steps for the original secondary at the primary site.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Failing back from DR site after primary site is back online