troubleshooting
12 TopicsRecovering Missing Rows (“Gaps”) in Azure SQL Data Sync — Supported Approaches (and What to Avoid)
Azure SQL Data Sync is commonly used to keep selected tables synchronized between a hub database and one or more member databases. In some cases, you may discover a data “gap”: a subset of rows that exist in the source but are missing on the destination for a specific time window, even though synchronization continues afterward for new changes. This post explains supported recovery patterns and what not to do, based on a real support scenario where a customer reported missing rows for a single table within a sync group and requested a way to synchronize only the missing records. The scenario: Data Sync continues, but some rows are missing In the referenced case, the customer observed that: A specific table had a gap on the member side (missing rows for a period), while newer data continued to sync normally afterward. They asked for a Microsoft-supported method to sync only the missing rows, without rebuilding or fully reinitializing the table. This is a reasonable goal—but the recovery method matters, because Data Sync relies on service-managed tracking artifacts. Temptation: “Can we push missing data by editing tracking tables or calling internal triggers?” A frequent idea is to “force” Data Sync to pick up missing rows by manipulating internal artifacts: Writing directly into Data Sync tracking tables (for example, tables under the DataSync schema such as *_dss_tracking), or altering provisioning markers. Manually invoking Data Sync–generated triggers or relying on their internal logic. The case discussion specifically referenced internal triggers such as _dss_insert_trigger, _dss_update_trigger, and _dss_delete_trigger. Why this is not recommended / not supported as a customer-facing solution In the case, the guidance from Microsoft engineering was clear: Manually invoking internal Data Sync triggers is not supported and can increase the risk of data corruption because these triggers are service-generated at runtime and are not intended for manual use. Directly manipulating Data Sync tracking/metadata tables is not recommended. The customer thread also highlights that these tracking tables are part of Data Sync internals, and using them for manual “push” scenarios is not a supported approach. Also, the customer conversation highlights an important conceptual point: tracking tables are part of how the service tracks changes; they are not meant to be treated as a user-managed replication queue. Supported recovery option #1 (recommended): Re-drive change detection via the base table The most supportable approach is to make Data Sync detect the missing rows through its normal change tracking path—by operating on the base/source table, not the service-managed internals. A practical pattern: “No-op update” to re-fire tracking In the internal discussion with the product team, the recommended pattern was to update the source/base table (even with a “no-op” assignment) so that Data Sync’s normal tracking logic is triggered, without manually invoking internal triggers. Example pattern (conceptual): UPDATE t SET some_column = some_column -- no-op: value unchanged FROM dbo.YourTable AS t WHERE <filter identifying the rows that are missing on the destination>; This approach is called out explicitly in the thread as a way to “re-drive” change detection safely through supported mechanisms. Operational guidance (practical): Apply the update in small batches, especially for large tables, to reduce transaction/log impact and avoid long-running operations. Validate the impacted row set first (for example, by comparing keys between hub and member). Supported recovery option #2: Deprovision and re-provision the affected table (safe “reset” path) If the gap is large, the row-set is hard to isolate, or you want a clean realignment of tracking artifacts, the operational approach discussed in the case was: Stop sync Remove the table from the sync group (so the service deprovisions tracking objects) Fix/clean the destination state as needed Add the table back and let Data Sync re-provision and sync again This option is often the safest when the goal is to avoid touching system-managed artifacts directly. Note: In production environments, customers may not be able to truncate/empty tables due to operational constraints. In that situation, the sync may take longer because the service might need to do more row-by-row evaluation. This “tradeoff” was discussed in the case context. Diagnostics: Use the Azure SQL Data Sync Health Checker When you suspect metadata drift, missing objects, or provisioning inconsistencies, the case recommended using the AzureSQLDataSyncHealthChecker script. This tool: Validates hub/member metadata and scopes against the sync metadata database Produces logs that can highlight missing artifacts and other inconsistencies Is intended to help troubleshoot Data Sync issues faster A likely contributor to “gaps”: schema changes during Data Sync (snapshot isolation conflict) In the case discussion, telemetry referenced an error consistent with concurrent DDL/schema changes while the sync process is enumerating changes (snapshot isolation + metadata changes). A well-known related error is SQL Server error 3961, which occurs when a snapshot isolation transaction fails because metadata was modified by a concurrent DDL statement, since metadata is not versioned. Microsoft documents this behavior and explains why metadata changes conflict with snapshot isolation semantics. Prevention guidance (practical) Avoid running schema deployments (DDL) during active sync windows. Use a controlled workflow for schema changes with Data Sync—pause/coordinate changes to prevent mid-sync metadata shifts. (General best practices exist for ongoing Data Sync operations and maintenance.) Key takeaways Do not treat Data Sync tracking tables/triggers as user-managed “replication internals.” Manually invoking internal triggers or editing tracking tables is not a supported customer-facing recovery mechanism. Do recover gaps via base table operations (insert/update) so the service captures changes through its normal path—“no-op update” is one practical pattern when you already know the missing row set. For large/complex gaps, consider the safe reset approach: remove the table from the sync group and re-add it to re-provision artifacts. Use the AzureSQLDataSyncHealthChecker to validate metadata consistency and reduce guesswork. If you see intermittent failures around deployments, consider the schema-change + snapshot isolation pattern (e.g., error 3961) as a possible contributor and schedule DDL changes accordingly. From our experience, when there are row gaps it is usually because of change in the PK or source table.198Views0likes0CommentsLesson Learned #517:Connection Timeouts to Azure SQL Database Using Private Endpoint with HikariCP
Recently, we have been working on a support case where our customer reported the following error message: ERROR com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Exception during pool initialization. com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host myserverX.database.windows.net, port 1433 has failed. Error: 'Connection timed out: no further information. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.' I would like to share the lessons learned during the troubleshooting and resolution process Initially, what caught our attention were the first two messages: Exception during pool initialization and Connection Timeout out. This indicates that Hikari was unable to create the connection pool during the initialization process (one of the first steps in connection pooling) due to a connection timeout. Therefore, we began working on connectivity options and started investigating, asking if our customer is using public endpoint or private endpoint, they confirmed the private endpoint usage. The following factors could be identified as possible causes: Private Endpoint Misconfiguration: The Private Endpoint was not correctly associated with the Azure SQL Database. DNS Resolution Issues: The database hostname was not resolving to the Private Endpoint's private IP address. Network Security Group (NSG) Restrictions: The NSG attached to the subnet hosting the Private Endpoint was blocking inbound traffic on port 1433. Firewall Settings on Azure SQL Database: Firewall rules were not allowing connections from the source network. Redirect Mode Configuration: Additional ports required for redirect mode were blocked or misconfigured. Our troubleshooting steps started running using nslookup myserverX.database.windows.net to ensure that the database hostname resolves to the Private Endpoint's private IP address. If the IP is public instead of private, we verify the DNS configuration or use Azure DNS. Our second step was to validate the Java Application host can reach Azure SQL Database on port 1433 or the redirect port (if our customer use redirect connection policy) using the command telnet myserverX.database.windows.net 1433 or the Linux command nc -vz myserverX.database.windows.net 1433 and we identified the connections fails. Check this Azure SQL Database and Azure Synapse Analytics connectivity architecture and Azure Private Link - Azure SQL Database & Azure Synapse Analytics | Microsoft Learn for more details. Following we verify the Network Security Group attached to the subnet if allows outbound traffic to the port 1433 and we found that the NSG rule doesn't exist for private endpoint. Once the rule was added the Java Application was able to connect.681Views0likes0CommentsHow to connect to Azure Database for MySQL – Flexible Server in a private network configuration
You might encounter connectivity issues when trying to access your database from different sources and networks. In this blog post, I’ll show you how to troubleshoot and resolve these types of issues based on three common network scenarios, as well as for an on-premises scenario.7.5KViews5likes0CommentsWhy should you migrate to PostgreSQL 15?
Recently one of my customers shared the below code snippet and I was looking for ways to gain performance. The code snippet A simple function that has evaluated expression in a loop for one billion iterations. function speed_test(p_loops int) returns int language plpgsql as $$ declare v_number bigint; v_multiplier float = 3.14159; loop_cnt bigint; begin for loop_cnt in 1..p_loops * 1000000 loop v_number := 333; v_number := v_number * v_multiplier; end loop; return 0; end;$$; The piece of code shared caught my attention because the code has evaluated expressions, and this was one of the areas where improvements were made after PostgreSQL 12. I ran a simple test to prove this. Test run on PostgreSQL v12: I ran the code run on PostgreSQL v12 in Azure Database for PostgreSQL Flexible Server. Region: West Europe SKU: GP D2ds_v4, 2 vCores 8 GB RAM, 128 GB Storage Test run on PostgreSQL v15: The same piece of code was run on PostgreSQL v15 in Azure Database for PostgreSQL Flexible Server. Region: West Europe SKU: GP D2ds_v4, 2 vCores 8 GB RAM, 128 GB Storage We can clearly see that the code snippet on PostgreSQL 15 is 5x faster than PostgreSQL 12. The rationale behind this is several patches that reduced the overhead of the expression's value in PL/pgSQL. PL/pgSQL was not originally designed for extensive numeric calculations like this but the need was realized as more and more developers started to use it this way, so over time there were several commits by the community that lead to this performance improvement. If you are curious to know more about this, check out this and this commit, which describes the improvements and history of this change. Thanks to Pavel Stehule from the PostgreSQL community to point out the rationale and consolidate out the commits associated with this change. If you are on an older version of PostgreSQL like PostgreSQL 12, this is just one of many other reasons to upgrade to PostgreSQL 15. You can see the feature matrix here. PostgreSQL15 will be released on Azure Database for PostgreSQL Flexible Server soon. If your workload is already running an older version of PostgreSQL, then plan your upgrade using the Major Version Upgrade.3.9KViews1like0CommentsNew troubleshooting video series for Azure Database for MySQL
We’re pleased to announce our latest video series, focused on troubleshooting potential issues with Azure Database for MySQL. With a new episode releasing every couple of weeks, the Azure Database for MySQL - Troubleshooting Series provides tips and tricks that you can use to anticipate and address potential issues in specific problem areas, such as performance, connectivity, replication latency, database corruption, and more!3.2KViews1like0CommentsHow to Capture the Actual Execution Plan in Azure SQL Database
This article describes steps how you can retrieve the actual execution plan of a query directly from the runtime statistics, without having to extract the query first and run it separately outside of the application.13KViews4likes0CommentsSpotlight on the ADX Time Pivot Visualization
The Time Pivot chart is a powerful interactive navigation tool that lets you analyze your data over an events timeline, that pivots on a time axis. Time Pivot makes it easy to identify event flows, and beautifully tells the story of “what happened”. It helps you to generate multiple views of the same data by letting you pick multiple levels of data slicing. Time Pivot visualization is your essential tool for analyzing data across hundreds of different tables.5.7KViews7likes2Comments