REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT and error 988 --SQL 2017 AlwaysOn
Published Jun 07 2020 07:17 PM 8,704 Views
Microsoft

Recently we got the issue from customer, they failed to add the database into the existing AG with error 988:

Msg 988, Level 14, State 1, Line 17

Unable to access database 'ag_03' because it lacks a quorum of nodes for high availability. Try the operation again later.

1.jpg2.jpg

No backup and no restored happened on all nodes.

 

Based on this message , usually we will check SQL Server errorlog, cluster log , run cluster validation to confirm the current cluster healthy.  But on this issue, the root cause is not caused by the cluster issue.

 

Here is the AG environment:

3-node AG, with 1 primary, 1 synchronized secondary, 1 aysnchoronized seconary, using SQL server 2017. 

Cx already ran the cluster validation , no issue found.  The AG already works 4 months no issue.

 

 

Troubleshooting

 

We set below XEVENT on our lab and customer's environment to compare the difference.

 

CREATE EVENT SESSION [ag_state_change] ON SERVER

ADD EVENT sqlserver.alwayson_ddl_executed(

    ACTION(sqlos.system_thread_id,sqlserver.session_id,sqlserver.sql_text)),

ADD EVENT sqlserver.hadr_db_commit_mgr_harden(

    ACTION(sqlos.system_thread_id,sqlserver.session_id,sqlserver.sql_text)),

ADD EVENT sqlserver.hadr_db_commit_mgr_set_policy(

    ACTION(sqlos.system_thread_id,sqlserver.session_id,sqlserver.sql_text)),

ADD EVENT sqlserver.hadr_db_commit_mgr_update_harden(

    ACTION(sqlos.system_thread_id,sqlserver.session_id,sqlserver.sql_text)),

ADD EVENT sqlserver.hadr_db_partner_set_policy(

    ACTION(sqlos.system_thread_id,sqlserver.session_id,sqlserver.sql_text)),

ADD EVENT sqlserver.hadr_db_partner_set_sync_state(

    ACTION(sqlos.system_thread_id,sqlserver.session_id,sqlserver.sql_text))

ADD TARGET package0.event_file(SET filename=N'ag_state_change')

WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)

GO

 

On our lab, before joining the new database to secondary,  database policy is DoNothing during DDL statement execution, and harden status is NoCommitFailure.

 

--Part 1 

ALTER AVAILABILITY GROUP [AG01]

ADD DATABASE AG_03;

 

BACKUP DATABASE AG_03 TO  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\Log\ag_db03.bak' WITH  COPY_ONLY, FORMAT, INIT, SKIP, REWIND, NOUNLOAD, COMPRESSION,  STATS = 5

3.jpg

 

After adding database, database harden policy changes to WaitForHarden, and the status is still NoCommitFailure.

 

--Part 2

ALTER DATABASE [ag_03] SET HADR AVAILABILITY GROUP = [AG01];

4.jpg

On customer's environment, we could see the major different is, the hadr_db_commit_mgr_set_policy of below trace set to WaitForHarden soon after add database into AG, and the harden status is MinSyncCommitFailure. Thus this transaction is not committed, not to mention the following queries.

 

5.jpg

 

After checking, we find there is a new feature called REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT on SQL Server 2017. Here below is the definition of this feature:

 

"Used to set a minimum number of synchronous secondary replicas required to commit before the primary commits a transaction. Guarantees that SQL Server transactions will wait until the transaction logs are updated on the minimum number of secondary replicas…When replicas are in synchronous commit mode, writes on the primary replica wait until writes on the secondary synchronous replicas are committed to the replica database transaction log. "

(https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-availability-group-transact-sql?view=sql...)

 

That's the reason why database policy changes to WaitForHarden after adding the new database on primary. Primary need to confirm this transaction won't commit until secondary's is hardened.  Yet from secondary's view, it can only add the new database after the primary transaction committed. This feature conflicts with what we want to do.

 

How to solve the issue actually is easy:

Disable this feature, or using Auto Seeding / Join Only mode to synchronize data with secondary.

 

Enjoy tech :)

1 Comment
Version history
Last update:
‎Aug 12 2020 12:09 AM
Updated by: