Sharepoint disaster recovery test failure - SQL issue

Brass Contributor

My company recently tested our sharepoint disaster recovery plan, and well, let's just say it was a disaster.

 

We have two sites, and each as two WFE's and 2 app servers.  There is a sql box at each site, configured to fail over from the primary to the secondary.

 

The load balancers point to the primary site's 2 WFE's, and then if those don't respond will fail over to the other sites's 2 WFE's.

 

When we started the test the DBA's just failed over the sql box to the other site, but left the primary server running.  We then shut down all the primary sites WFE's and App servers.  When we did that we found that we couldn't connect to SP any longer.

 

The error we were getting in the logs was this:

SPSqlLatencyMonitor.PingServer: exception encountered on database: (removed)_Config and server: (removed) : System.Data.SqlClient.SqlException (0x80131904): The target database, '(removed)_Config', is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group. For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online. Cannot continue the execution because the session is in the kill state. A severe error occurred on the current command. The results, if any, should be discarded.

 

After a bit we actually shut down the sql box, and then the errors changed to this:

Unexpected SQL Error: System.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible.

 

So it looks like we have the SQL set up done wrong, because even with the server failed over SP was still trying to talk to the primary sql box.

 

I've read through a whole bunch of MS docs sites on how to set up the redundant configuration, but they are all "high level" architecture type docs.  Does anyone know of a good point by point description of setting up a sql listener in SP and making sure that it is set up correctly?  From what I can see we are missing something in that regard.  Something is still trying to talk directly to the primary box.

 

Thanks in advance for your help.

2 Replies
The first really big question is, was SharePoint setup using a SQL Alias or is it using the SQL servers actual name?

To support DR scenarios there are many things to consider, but most start the the initial configuration of SharePoint. The failover process should include a DNS change to tell the SharePoint servers how to find the SQL Server when failover occurs.

I understand your frustration with your sharepoint disaster recovery plan. It's concerning that even with the sql box failed over, your SharePoint still couldn't connect to the database. Have you tried reaching out to Microsoft support for assistance in setting up the sql listener correctly? It may be beneficial to have an expert review your configuration and provide guidance on how to properly set up a redundant configuration.