availability group
2 TopicsSQL Server AG Failover - Automatic Failover
Hello, I am looking for a straight and definitive answer that I was hoping someone could answer for me. I want to trust what Copilot says, but I really need to hear it from Microsoft and I can't find any documentation from Microsoft confirming my question. My Environment: 2 replicas in datacenter 1 1 replica in datacenter 2 All three (3) replicas are set to synchronous-commit mode with automatic failover. I tested the failover manually between all three (3) replicas without issue. When I test the automatic failover - I take down both replicas in datacenter 1 at the same time to simulate a datacenter outage. I look at the replica in datacenter 2 and it is just says (Resolving...) next to the replica name. The replica does not come online and the DB is not moved. When I was searching I couldn't find out why. So I turned to Copilot not solve the issue, but to see if it could point me in the right direction. I tell Copilot my setup and what happened. Copilot responded stating that by design from Microsoft you cannot have more than two (2) replicas set to synchronous-commit mode with automatic failover in a SQL Server AG instance. That if more than two (2) are set for automatic failover. The SQL Server AG will use the first two (2) replicas it sees in its metadata and ignore the rest. Copilot went into detail about why this is designed this way, but the amount of information would make this post longer than it already is. If this is true - then when I took down both replicas in datacenter 1, SQL Server AG only saw those two (2) replicas in datacenter 1 as the available replicas to use for an automatic failover and thus why the replica in datacenter 2 did not come online and the DB not being moved So let's do a test. I brought back up the two (2) replicas in datacenter 1. Then I made a change in the AG proprieties. I set the 2nd replica in datacenter 1 to manual. So 1 replica is set to automatic failover and 1 replica is set to manual failover in datacenter 1. The replica in datacenter 2 is set to automatic failover I then take down both replicas in datacenter 1 again to simulate the "outage" and the replica in datacenter 2 comes online and the DB is moved. So is Copilot right? Can there only be two (2) replicas allowed to have/use automatic failover? I cannot find a definitive answer confirming this. Or is my configuration wrong/missing something and if it is, could you please point me in the right direction on how to get this resolved?113Views0likes4CommentsSQL Server Distributed AG's Forwarder Is Not Syncing After Primary AG's Internal Failover
I have set up a SQL Server Distributed Availability Group (DAG) in Kubernetes using SQL Server on Ubuntu images. The setup consists of two availability groups (AGs) across two separate clusters: Setup Details: Primary Cluster (AG1) Pods: ag1-0 (Primary), ag1-1, ag1-2. The Primary is Exposed via the LoadBalancer service. Remote Cluster (AG2): Pods: ag2-0 (The Primary of AG2, Acting as a forwarder of DAG), ag2-1, ag2-2. The Forwarder (ag2-0) is Exposed via the LoadBalancer service. Distributed AG Configuration: AG1 and AG2 are part of the DAG. Each AG’s primary is dynamically selected using the pod's label role=primary. LISTENER_URL in the DAG configuration points to the LoadBalancer service of each AG. Issue: DAG Not Syncing After AG1 Failover For testing, I triggered a failover in AG1 using: `ALTER AVAILABILITY GROUP [AG1] FORCE_FAILOVER_ALLOW_DATA_LOSS;` The global primary changed from ag1-0 to ag1-1, and I updated the role=primary label accordingly (removed from ag1-0, added to ag1-1. However, AG2 (the forwarder and its replicas) stopped syncing and became unhealthy. From ag2-0 (forwarder) logs, I only see connection timeouts and disconnections from the global primary. AG2 is not automatically reconnecting to the new primary (ag1-1), even though the LoadBalancer service in LISTENER_URL now points to ag1-1. Logs from ag2-0 (Forwarder) Shows Like A connection timeout has occurred while attempting to establish a connection to GLOBAL PRIMARY. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance Steps I Tried: - Checked DAG Configuration – The LISTENER_URL is correctly set to the LoadBalancer of AG1, which now points to ag1-1. - Ran the Resume Command: `ALTER DATABASE [agtestdb] SET HADR RESUME;` This did not resolve the issue. - Verified Network Connectivity Questions: - What steps are required to ensure AG2 correctly syncs with the new global primary (ag1-1) after an AG1's internal failover? - Is there a specific command that needs to be run on the forwarder (ag2-0) or the new global primary (ag1-1) to reestablish synchronization? - Why isn’t AG2 automatically reconnecting, even though the LoadBalancer service points to the correct primary? - Are there any best practices for handling SQL Server DAG failovers in Kubernetes? Any insights would be greatly appreciated!364Views0likes2Comments