Forum Discussion
SQL Server AG Failover - Automatic Failover
Microsoft documentation claims you can have up to five replicas with synchronous commit and automatic failover.
SQL Server 2019 (15.x) increases the maximum number of synchronous replicas to 5, up from 3 in SQL Server 2017 (14.x). You can configure this group of five replicas to have automatic failover within the group. There's one primary replica, plus four synchronous secondary replicas.
What is an Always On availability group? - SQL Server Always On | Microsoft Learn
I believe in your first scenario, you met this constraint:
Automatic failover requires that both the primary replica and the target secondary replica run under synchronous-commit mode with the failover mode set to Automatic. In addition, the secondary replica must already be synchronized, have WSFC quorum, and meet the conditions specified by the flexible failover policy of the availability group.
Scenario 1:
- All three replicas set to automatic failover (2 on Site 1, 1 on Site 2)
- Stop both nodes on Site 1
- Last node in resolving state
I would assume that in this scenario, the failover cluster first moved the primary to the second replica on Site 1.
When that replica was removed, the WSFC no longer had quorum, and the replica on Site 2 could not be elected as primary. (or it hadn't re-synchronized after initial failover)
The loss of quorum would be especially true if you brought down both site 1 nodes simultaneously.
Scenario 2:
- One replica Site 1 and Site 2 automatic failover, one replica Site 1 manual failover
- Stop both nodes on Site 1
- Last node is primary
For this scenario, if you brought down the Site 1 nodes in series, starting with the node configured for automatic failover, WSFC had quorum and the replicas were synchronized. Failover occurred when the first node went down.
--
A Secondary replica will never promote to primary, if it cannot reach quorum.
A Primary replica may stay as a primary replica, if it still has the votes (otherwise it'll also stop)
Quorum is based on votes. For SQL Server Availability Groups, all replicas that participate in automatic failover should have a vote. You may have configured a "witness". It has voting power, but cannot run cluster resources (and does not store cluster configuration).
The necessary votes to have quorum is "(all possible votes / 2) + 1".
For three nodes, that means two votes required.
These measures are in place to ensure that you do not end up with two replicas acting as primaries (split-brain).
That could result in replicas getting different updates, and you get a mess on your hands.