SQL server 2019 alwayson problem

Copper Contributor


I've SQL server 2019 STD always on set up with 2 nodes and file share witness (located on third server). yesterday we had an network issue that the main switch was down for 15 minutes. after the network is back i see that the whole SQL server alwayson cluster was deleted from the computers completely. how this thing can be happened at all?



1 Reply
We have experienced similar faults when the network is unstable.
From what I understand, the failover cluster services on both nodes end up in a state where they believe their copy of the cluster configuration is corrupt, leading to a failure state where the servers are both participating in a failover cluster and not. In this scenario no node can host the core cluster resources and everything's broken.

Once a critical failure of the cluster service has occurred, the availability group configuration can and will be dropped from SQL Server as well.

I have not had any luck attempting to recover the CLUSDB, and have in these cases resorted to reinstalling failover clustering services and reconfiguring the entire cluster and availability groups.

I believe part of the problem in this cluster setup is that the file share witness does not keep a copy of the cluster configuration. All it does is vote on who should be the primary.

In any case it's technically not a SQL Server issue, but a failover clustering services issue.

I have been wondering if replacing the file share witness with a third cluster node (without SQL server) would make the cluster configuration more robust.