troubleshooting
7 TopicsTroubleshooting Azure SQL Data Sync Failure: SQL Error 8106 During Bulk Insert
Azure SQL Data Sync is widely used to maintain consistency across distributed databases in hub–member topologies. However, synchronization may occasionally fail due to schema mismatches between participating databases — even when everything appears correctly configured at first glance. In this post, we’ll walk through a real-world troubleshooting scenario involving a Data Sync failure caused by a schema inconsistency related to an IDENTITY column, and how it was mitigated. Sample Error: sync_7726d6cb22124c0f901192c434f49106bd618f8ab16343b2adc03250f8367ff4\3953fb7d-1dba-4656-8150-83153d5d019b.batch. See the inner exception for more details. Inner exception: Failed to execute the command 'BulkInsertCommand' for table 'schema.table_name'; the transaction was rolled back. Ensure that the command syntax is correct. Inner exception: SqlException ID: e19b3677-d67e-4c8e-bc49-13d3df61ad0e, Error Code: -2146232060 - SqlError Number:8106, Message: SQL error with code 8106 For more information, provide tracing ID ‘92e76130-f80a-4372-9a48-ec0ede8b0288’ to customer support." Scenario Overview A synchronization operation began failing for a specific table within an Azure SQL Data Sync group. The failure was observed during the sync process when applying changes using a batch file. The error surfaced as part of a failed BulkInsertCommand execution on a synced table, causing the transaction to roll back. Further investigation revealed the following SQL exception: SqlError Number: 8106 Table does not have the identity property. Cannot perform SET operation. Initial Troubleshooting Steps Before identifying the root cause, the following actions were taken: The affected table was removed from the sync group. A sync operation was triggered. The table was re-added to the sync group. Sync was triggered again. Despite performing these steps, the issue persisted with the same error. This indicated that the failure was not related to sync metadata or temporary configuration inconsistencies. Root Cause Analysis After reviewing the table definitions across the sync topology, it was discovered that: The synchronized table had an IDENTITY column defined on one side of the topology (Hub or Member) but not on the other. This schema mismatch led to the sync service attempting to apply SET IDENTITY_INSERT operations during the bulk insert phase — which failed on the database where the column lacked the identity property. Azure SQL Data Sync relies on consistent schema definitions across all participating databases. Any deviation — particularly involving identity columns — can interrupt data movement operations. Mitigation Approach To resolve the issue, the following corrective steps were applied: Remove the affected table from the sync group and save the configuration. Refresh the sync schema. Recreate the table to include the appropriate IDENTITY property. Add the corrected table back to the sync group. Trigger a new sync operation. These steps ensured that the table definitions were aligned across all sync participants, allowing the synchronization process to proceed successfully. Best Practices to Avoid Similar Issues To prevent identity-related sync failures in Azure SQL Data Sync: ✅ Ensure table schemas are identical across all participating databases before onboarding them into a sync group. ✅ Pay special attention to: IDENTITY properties Primary keys Data types Nullable constraints ✅ Always validate schema consistency when: Adding new tables to a sync group Modifying existing table definitions Final Thoughts Schema mismatches — especially those involving identity columns — are a common but often overlooked cause of Data Sync failures. By ensuring consistent table definitions across your hub and member databases, you can significantly reduce the risk of synchronization errors and maintain reliable data movement across regions.Troubleshooting Azure SQL Data Sync Groups Stuck in Progressing State
Azure SQL Data Sync is commonly used to synchronize data across Azure SQL Databases and on‑premises SQL Server environments. While the service works well in many scenarios, customers may occasionally encounter a situation where a Sync Group remains stuck in a “Progressing” state and cannot be started, stopped, or refreshed. This blog walks through a real-world troubleshooting scenario, highlights the root cause, and outlines practical remediation steps based on actual support investigation and collaboration. Problem Overview In this scenario, the customer reported that: The Sync Group was stuck in “Progressing” for multiple days Sync operations could not be started or stopped Tables could not be refreshed or reconfigured Azure Activity Logs showed operations as Succeeded, yet sync never progressed Our backend telemetry showed the Sync Group as Active, while hub and member databases were in Reprovisioning state The last successful sync occurred on XX day, after which the sync pipeline stopped making progress. Initial Investigation Findings During the investigation, several key observations were made: 1. High DATA IO Utilization Telemetry and backend checks revealed that DATA IO utilization was pegged at 100% on one of the sync member databases starting XX day. Despite no noticeable change in application workload, the database was under sustained IO pressure, which directly impacted Data Sync operations. 2. Deadlocks During Sync Processing Our backend telemetry showed repeated deadlock errors: Transaction was deadlocked on lock resources with another process and has been chosen as the deadlock victim. These deadlocks were observed for multiple Sync Member IDs starting the same day IO saturation began. This aligned with the hypothesis that resource contention, not a Data Sync service failure, was the underlying issue. 3. Metadata Database Was Healthy The Sync metadata database was running on a serverless Azure SQL Database (1 vCore) and showed healthy resource usage, ruling it out as a bottleneck. Recommended Troubleshooting Steps Based on the findings, the following steps were recommended and validated: ✅ Step 1: Address Database Resource Constraints First Before attempting to recreate or reset the Sync Group, the focus was placed on resolving DATA IO saturation on the affected database. Actions included: Scaling up the database (DTUs / vCores) Monitoring IO utilization after scaling Ensuring sufficient headroom for sync operations This was identified as the primary remediation step. ✅ Step 2: Use the Azure SQL Data Sync Health Checker The Azure SQL Data Sync Health Checker was recommended to validate: Sync metadata integrity Table-level configuration issues Agent and connectivity status GitHub tool: AzureSQLDataSyncHealthChecker ✅ Step 3: Validate Sync Group and Agent State via PowerShell PowerShell was used to confirm: Sync Group state Last successful sync time On‑premises Sync Agent connectivity Example commands used: Get-AzureRmSqlSyncGroup ` -ResourceGroupName "ResourceGroup01" ` -ServerName "Server01" ` -DatabaseName "Database01" | Format-List Get-AzureRmSqlSyncAgent ` -ResourceGroupName "ResourceGroup01" ` -ServerName "Server01" | Select ResourceGroupName, SyncState, LastSyncTime Resolution After the customer increased the database size, DATA IO utilization dropped, sync operations resumed normally, and the customer confirmed that the issue was resolved.Recovering Missing Rows (“Gaps”) in Azure SQL Data Sync — Supported Approaches (and What to Avoid)
Azure SQL Data Sync is commonly used to keep selected tables synchronized between a hub database and one or more member databases. In some cases, you may discover a data “gap”: a subset of rows that exist in the source but are missing on the destination for a specific time window, even though synchronization continues afterward for new changes. This post explains supported recovery patterns and what not to do, based on a real support scenario where a customer reported missing rows for a single table within a sync group and requested a way to synchronize only the missing records. The scenario: Data Sync continues, but some rows are missing In the referenced case, the customer observed that: A specific table had a gap on the member side (missing rows for a period), while newer data continued to sync normally afterward. They asked for a Microsoft-supported method to sync only the missing rows, without rebuilding or fully reinitializing the table. This is a reasonable goal—but the recovery method matters, because Data Sync relies on service-managed tracking artifacts. Temptation: “Can we push missing data by editing tracking tables or calling internal triggers?” A frequent idea is to “force” Data Sync to pick up missing rows by manipulating internal artifacts: Writing directly into Data Sync tracking tables (for example, tables under the DataSync schema such as *_dss_tracking), or altering provisioning markers. Manually invoking Data Sync–generated triggers or relying on their internal logic. The case discussion specifically referenced internal triggers such as _dss_insert_trigger, _dss_update_trigger, and _dss_delete_trigger. Why this is not recommended / not supported as a customer-facing solution In the case, the guidance from Microsoft engineering was clear: Manually invoking internal Data Sync triggers is not supported and can increase the risk of data corruption because these triggers are service-generated at runtime and are not intended for manual use. Directly manipulating Data Sync tracking/metadata tables is not recommended. The customer thread also highlights that these tracking tables are part of Data Sync internals, and using them for manual “push” scenarios is not a supported approach. Also, the customer conversation highlights an important conceptual point: tracking tables are part of how the service tracks changes; they are not meant to be treated as a user-managed replication queue. Supported recovery option #1 (recommended): Re-drive change detection via the base table The most supportable approach is to make Data Sync detect the missing rows through its normal change tracking path—by operating on the base/source table, not the service-managed internals. A practical pattern: “No-op update” to re-fire tracking In the internal discussion with the product team, the recommended pattern was to update the source/base table (even with a “no-op” assignment) so that Data Sync’s normal tracking logic is triggered, without manually invoking internal triggers. Example pattern (conceptual): UPDATE t SET some_column = some_column -- no-op: value unchanged FROM dbo.YourTable AS t WHERE <filter identifying the rows that are missing on the destination>; This approach is called out explicitly in the thread as a way to “re-drive” change detection safely through supported mechanisms. Operational guidance (practical): Apply the update in small batches, especially for large tables, to reduce transaction/log impact and avoid long-running operations. Validate the impacted row set first (for example, by comparing keys between hub and member). Supported recovery option #2: Deprovision and re-provision the affected table (safe “reset” path) If the gap is large, the row-set is hard to isolate, or you want a clean realignment of tracking artifacts, the operational approach discussed in the case was: Stop sync Remove the table from the sync group (so the service deprovisions tracking objects) Fix/clean the destination state as needed Add the table back and let Data Sync re-provision and sync again This option is often the safest when the goal is to avoid touching system-managed artifacts directly. Note: In production environments, customers may not be able to truncate/empty tables due to operational constraints. In that situation, the sync may take longer because the service might need to do more row-by-row evaluation. This “tradeoff” was discussed in the case context. Diagnostics: Use the Azure SQL Data Sync Health Checker When you suspect metadata drift, missing objects, or provisioning inconsistencies, the case recommended using the AzureSQLDataSyncHealthChecker script. This tool: Validates hub/member metadata and scopes against the sync metadata database Produces logs that can highlight missing artifacts and other inconsistencies Is intended to help troubleshoot Data Sync issues faster A likely contributor to “gaps”: schema changes during Data Sync (snapshot isolation conflict) In the case discussion, telemetry referenced an error consistent with concurrent DDL/schema changes while the sync process is enumerating changes (snapshot isolation + metadata changes). A well-known related error is SQL Server error 3961, which occurs when a snapshot isolation transaction fails because metadata was modified by a concurrent DDL statement, since metadata is not versioned. Microsoft documents this behavior and explains why metadata changes conflict with snapshot isolation semantics. Prevention guidance (practical) Avoid running schema deployments (DDL) during active sync windows. Use a controlled workflow for schema changes with Data Sync—pause/coordinate changes to prevent mid-sync metadata shifts. (General best practices exist for ongoing Data Sync operations and maintenance.) Key takeaways Do not treat Data Sync tracking tables/triggers as user-managed “replication internals.” Manually invoking internal triggers or editing tracking tables is not a supported customer-facing recovery mechanism. Do recover gaps via base table operations (insert/update) so the service captures changes through its normal path—“no-op update” is one practical pattern when you already know the missing row set. For large/complex gaps, consider the safe reset approach: remove the table from the sync group and re-add it to re-provision artifacts. Use the AzureSQLDataSyncHealthChecker to validate metadata consistency and reduce guesswork. If you see intermittent failures around deployments, consider the schema-change + snapshot isolation pattern (e.g., error 3961) as a possible contributor and schedule DDL changes accordingly. From our experience, when there are row gaps it is usually because of change in the PK or source table.211Views0likes0CommentsLesson Learned #517:Connection Timeouts to Azure SQL Database Using Private Endpoint with HikariCP
Recently, we have been working on a support case where our customer reported the following error message: ERROR com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Exception during pool initialization. com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host myserverX.database.windows.net, port 1433 has failed. Error: 'Connection timed out: no further information. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.' I would like to share the lessons learned during the troubleshooting and resolution process Initially, what caught our attention were the first two messages: Exception during pool initialization and Connection Timeout out. This indicates that Hikari was unable to create the connection pool during the initialization process (one of the first steps in connection pooling) due to a connection timeout. Therefore, we began working on connectivity options and started investigating, asking if our customer is using public endpoint or private endpoint, they confirmed the private endpoint usage. The following factors could be identified as possible causes: Private Endpoint Misconfiguration: The Private Endpoint was not correctly associated with the Azure SQL Database. DNS Resolution Issues: The database hostname was not resolving to the Private Endpoint's private IP address. Network Security Group (NSG) Restrictions: The NSG attached to the subnet hosting the Private Endpoint was blocking inbound traffic on port 1433. Firewall Settings on Azure SQL Database: Firewall rules were not allowing connections from the source network. Redirect Mode Configuration: Additional ports required for redirect mode were blocked or misconfigured. Our troubleshooting steps started running using nslookup myserverX.database.windows.net to ensure that the database hostname resolves to the Private Endpoint's private IP address. If the IP is public instead of private, we verify the DNS configuration or use Azure DNS. Our second step was to validate the Java Application host can reach Azure SQL Database on port 1433 or the redirect port (if our customer use redirect connection policy) using the command telnet myserverX.database.windows.net 1433 or the Linux command nc -vz myserverX.database.windows.net 1433 and we identified the connections fails. Check this Azure SQL Database and Azure Synapse Analytics connectivity architecture and Azure Private Link - Azure SQL Database & Azure Synapse Analytics | Microsoft Learn for more details. Following we verify the Network Security Group attached to the subnet if allows outbound traffic to the port 1433 and we found that the NSG rule doesn't exist for private endpoint. Once the rule was added the Java Application was able to connect.706Views0likes0CommentsHow to Capture the Actual Execution Plan in Azure SQL Database
This article describes steps how you can retrieve the actual execution plan of a query directly from the runtime statistics, without having to extract the query first and run it separately outside of the application.13KViews4likes0Comments