Availability Group failover issue

rj452tm
Copper Contributor
Aug 28, 2025
I think you're on the right track. We perform a failover at least once a month and have never had this issue. Plus the manual failover was successful after the auto failover failed. So this seems to be specific to auto failover.
I think that after the AG initially failed on node A, the cluster tried to restart it and it briefly came back up on node A, failed again, and then the cluster tried to perform an auto-failover to node B. During that short time when the AG was up on node A after the initial failure, I suspect that the secondary replica on node B wasn't able to connect and become fully re-synchronized before the AG failed again. So when the cluster tried to fail it over to node B, it couldn't because node B wasn't synchronized. Is that possible? I'm not entirely confident in this theory because it doesn't explain how node B was eventually able to become failover-ready when we performed the manual failover.
In any case, I'm still left with little confidence that an auto-failover in the event of a similar issue will work correctly. The safety we were hoping to achieve by keeping all nodes in synchronous commit mode didn't pay off and I don't know of anything to change to ensure it will work in the future.
SivertSolem
Iron Contributor
Aug 28, 2025
Also verify that all databases in the availability groups exist on node B.

Forum Discussion