data consistency
1 TopicRecovering Missing Rows (“Gaps”) in Azure SQL Data Sync — Supported Approaches (and What to Avoid)
Azure SQL Data Sync is commonly used to keep selected tables synchronized between a hub database and one or more member databases. In some cases, you may discover a data “gap”: a subset of rows that exist in the source but are missing on the destination for a specific time window, even though synchronization continues afterward for new changes. This post explains supported recovery patterns and what not to do, based on a real support scenario where a customer reported missing rows for a single table within a sync group and requested a way to synchronize only the missing records. The scenario: Data Sync continues, but some rows are missing In the referenced case, the customer observed that: A specific table had a gap on the member side (missing rows for a period), while newer data continued to sync normally afterward. They asked for a Microsoft-supported method to sync only the missing rows, without rebuilding or fully reinitializing the table. This is a reasonable goal—but the recovery method matters, because Data Sync relies on service-managed tracking artifacts. Temptation: “Can we push missing data by editing tracking tables or calling internal triggers?” A frequent idea is to “force” Data Sync to pick up missing rows by manipulating internal artifacts: Writing directly into Data Sync tracking tables (for example, tables under the DataSync schema such as *_dss_tracking), or altering provisioning markers. Manually invoking Data Sync–generated triggers or relying on their internal logic. The case discussion specifically referenced internal triggers such as _dss_insert_trigger, _dss_update_trigger, and _dss_delete_trigger. Why this is not recommended / not supported as a customer-facing solution In the case, the guidance from Microsoft engineering was clear: Manually invoking internal Data Sync triggers is not supported and can increase the risk of data corruption because these triggers are service-generated at runtime and are not intended for manual use. Directly manipulating Data Sync tracking/metadata tables is not recommended. The customer thread also highlights that these tracking tables are part of Data Sync internals, and using them for manual “push” scenarios is not a supported approach. Also, the customer conversation highlights an important conceptual point: tracking tables are part of how the service tracks changes; they are not meant to be treated as a user-managed replication queue. Supported recovery option #1 (recommended): Re-drive change detection via the base table The most supportable approach is to make Data Sync detect the missing rows through its normal change tracking path—by operating on the base/source table, not the service-managed internals. A practical pattern: “No-op update” to re-fire tracking In the internal discussion with the product team, the recommended pattern was to update the source/base table (even with a “no-op” assignment) so that Data Sync’s normal tracking logic is triggered, without manually invoking internal triggers. Example pattern (conceptual): UPDATE t SET some_column = some_column -- no-op: value unchanged FROM dbo.YourTable AS t WHERE <filter identifying the rows that are missing on the destination>; This approach is called out explicitly in the thread as a way to “re-drive” change detection safely through supported mechanisms. Operational guidance (practical): Apply the update in small batches, especially for large tables, to reduce transaction/log impact and avoid long-running operations. Validate the impacted row set first (for example, by comparing keys between hub and member). Supported recovery option #2: Deprovision and re-provision the affected table (safe “reset” path) If the gap is large, the row-set is hard to isolate, or you want a clean realignment of tracking artifacts, the operational approach discussed in the case was: Stop sync Remove the table from the sync group (so the service deprovisions tracking objects) Fix/clean the destination state as needed Add the table back and let Data Sync re-provision and sync again This option is often the safest when the goal is to avoid touching system-managed artifacts directly. Note: In production environments, customers may not be able to truncate/empty tables due to operational constraints. In that situation, the sync may take longer because the service might need to do more row-by-row evaluation. This “tradeoff” was discussed in the case context. Diagnostics: Use the Azure SQL Data Sync Health Checker When you suspect metadata drift, missing objects, or provisioning inconsistencies, the case recommended using the AzureSQLDataSyncHealthChecker script. This tool: Validates hub/member metadata and scopes against the sync metadata database Produces logs that can highlight missing artifacts and other inconsistencies Is intended to help troubleshoot Data Sync issues faster A likely contributor to “gaps”: schema changes during Data Sync (snapshot isolation conflict) In the case discussion, telemetry referenced an error consistent with concurrent DDL/schema changes while the sync process is enumerating changes (snapshot isolation + metadata changes). A well-known related error is SQL Server error 3961, which occurs when a snapshot isolation transaction fails because metadata was modified by a concurrent DDL statement, since metadata is not versioned. Microsoft documents this behavior and explains why metadata changes conflict with snapshot isolation semantics. Prevention guidance (practical) Avoid running schema deployments (DDL) during active sync windows. Use a controlled workflow for schema changes with Data Sync—pause/coordinate changes to prevent mid-sync metadata shifts. (General best practices exist for ongoing Data Sync operations and maintenance.) Key takeaways Do not treat Data Sync tracking tables/triggers as user-managed “replication internals.” Manually invoking internal triggers or editing tracking tables is not a supported customer-facing recovery mechanism. Do recover gaps via base table operations (insert/update) so the service captures changes through its normal path—“no-op update” is one practical pattern when you already know the missing row set. For large/complex gaps, consider the safe reset approach: remove the table from the sync group and re-add it to re-provision artifacts. Use the AzureSQLDataSyncHealthChecker to validate metadata consistency and reduce guesswork. If you see intermittent failures around deployments, consider the schema-change + snapshot isolation pattern (e.g., error 3961) as a possible contributor and schedule DDL changes accordingly.