SQL Server Distributed AG's Forwarder Is Not Syncing After Primary AG's Internal Failover

I have set up a SQL Server Distributed Availability Group (DAG) in Kubernetes using SQL Server on Ubuntu images. The setup consists of two availability groups (AGs) across two separate clusters: Set...

availability group

distributed

failover

SivertSolem

Iron Contributor

Feb 12, 2025

`ALTER AVAILABILITY GROUP [AG1] FORCE_FAILOVER_ALLOW_DATA_LOSS;`

There's your starting issue.

Forces failover of the availability group, with possible data loss, to the failover target. The failover target will take over the primary role and recover its copy of each database and bring them online as the new primary databases. On any remaining secondary replicas, every secondary database is suspended until manually resumed. When the former primary replica becomes available, it will switch to the secondary role, and its databases will become suspended secondary databases.
ALTER AVAILABILITY GROUP (Transact-SQL) - SQL Server | Microsoft Learn

What you most likely wanted to do, was the `ALTER AVAILABILITY GROUP [AG1] FAILOVER;` command.

FORCE_FAILOVER_ALLOW_DATA_LOSS is only for situations where the primary is unavailable, automatic failover has not occurred, and the regular FAILOVER command does not work.

As for why resuming doesn't work, I'll have to admit I don't personally have sufficient experience with DAGs.
I would suggest verifying whether the databases are suspended on the primary as well, and running a MODIFY on your Global primary resetting your modes.
Resetting SEEDING_MODE is how you'd restart (or cancel) automatic seeding attempts, for example.

ALTER AVAILABILITY GROUP [DAG1]
   MODIFY
   AVAILABILITY GROUP ON
      'AG1' WITH
      (
         AVAILABILITY_MODE = YourCurrentMode,
         FAILOVER_MODE = YourCurrentMode,
         SEEDING_MODE = YourCurrentMode
      ),
      'AG2' WITH
      (
         AVAILABILITY_MODE = YourCurrentMode,
         FAILOVER_MODE = YourCurrentMode,
         SEEDING_MODE = YourCurrentMode
      );
GO

neajmorshad

Copper Contributor

Feb 24, 2025

As I am handling failover manually, Availability Groups are created with `FAILOVER_MODE = MANUAL`, (Following Etcd's Raft Implementation for leader election and failover decisions), I only do failover when the primary is unavailable. So `FORCE_FAILOVER_ALLOW_DATA_LOSS;` needs to be used.

My primary AG (AG1) is properly synced and healthy (The databases were suspended after running the force failover command, But I resumed them manually. Old primary ag1-0 is also rejoined successfully). New Primary is taking new writes and all the secondaries including the old primary ag1-0 are being synced with the latest Global Primary ag1-1. So I guess it's not an issue with the `FORCE_FAILOVER_ALLOW_DATA_LOSS;` command.

I tried with the `ALTER AVAILABILITY GROUP [DAG] MODIFY` on the Global Primary to reset the `SEEDING_MODE` for restarting automatic seeding. Which also didn't solve the issue.

Forum Discussion

SQL Server Distributed AG's Forwarder Is Not Syncing After Primary AG's Internal Failover

Resources