Forum Discussion
Availability Group failover issue
It looks like Node B wasn’t fully ready when the auto-failover happened, even though it was in sync. SQL requires all databases to be “failover ready,” and if one lags or isn’t joined, it won’t switch. Best to check is_failover_ready, review failover policy (FailureConditionLevel/health timeout), and do regular manual failover tests to be sure secondaries can take over.
MS Article: https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-flexible-automatic-failover-policy?view=sql-server-ver17
- rj452tmAug 28, 2025Copper Contributor
I think you're on the right track. We perform a failover at least once a month and have never had this issue. Plus the manual failover was successful after the auto failover failed. So this seems to be specific to auto failover.
I think that after the AG initially failed on node A, the cluster tried to restart it and it briefly came back up on node A, failed again, and then the cluster tried to perform an auto-failover to node B. During that short time when the AG was up on node A after the initial failure, I suspect that the secondary replica on node B wasn't able to connect and become fully re-synchronized before the AG failed again. So when the cluster tried to fail it over to node B, it couldn't because node B wasn't synchronized. Is that possible? I'm not entirely confident in this theory because it doesn't explain how node B was eventually able to become failover-ready when we performed the manual failover.
In any case, I'm still left with little confidence that an auto-failover in the event of a similar issue will work correctly. The safety we were hoping to achieve by keeping all nodes in synchronous commit mode didn't pay off and I don't know of anything to change to ensure it will work in the future.
- SivertSolemAug 28, 2025Iron Contributor
Also verify that all databases in the availability groups exist on node B.