Best Practices for Azure SQL DataSync 2.0 tasks

Copper Contributor

Mar 14, 2019

First published on MSDN on Jul 25, 2017
The Azure SQL DataSync workflow is carried by different tasks at different stages. The first task is to provision destination databases (Initial Synchronization). Then data changes (update/insert/delete) are propagated to the destination by sync tasks. If any changes are made in dataset or schema, re-provisioning occurs. Deprovisioning is engaged when a database is removed from the DataSync topology. Below are some “best practice” recommendations for each of the tasks:

Initial Synchronization

The provisioning task creates customer table schemas defined in the dataset. For each table, a tracking table and a set of stored procedures and triggers are also created; then the tables are populated. After provisioning, the source and destination database will have the same data defined by the dataset.

Please do not remove those system objects, it could break the sync and leave the database in an unpredictable condition.

For the best performance, whenever possible, start the initial synchronization with data in only the source databases. If you have data in both the source and destination tables, the provisioning task treats each row as a conflict and pursues conflict resolution, causing the provisioning to be much slower.

Data Change

Changes are applied to the destination in batches. A DataSync stored procedure with TVP (Table-valued parameters) is used for each batch. The data change is put in the TVP to avoid sending rows one-by-one. https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/table-valued-parameters

Avoid Synchronization loops. Synchronization loops are formed if the databases are added in two or more sync groups such that a change in a database in one sync group is re-written to the same database by another sync group. There are two scenarios that lead to a synchronization loop:

A circular reference involving two or more sync groups.

A single database registered with different agents and added to a sync group under two or more agents.

Schedule the sync task carefully, based upon the amount of data to sync. If the current scheduled sync is not completed in the scheduled interval, the sync won’t start at the next scheduled time. This leads to skipping one schedule and cumulating more data at the next. The result could be a much longer delay than expected.

The sync task will check the cancellation flag after each batch is applied. If necessary, you can safely stop the sync while in progress. The next batch will be applied when you start the sync again. Please be aware that when you cancel the sync, the sync group is left in a partially synced state. Resume the sync as soon as possible.

Re-provisioning

During the on-going course of the data sync, if any of the operations below are performed, the database will go through re-provisioning:

Add or remove tables

Add or remove columns

Change the width of a column

Change the data type of a column

Unlike the sync task, the re-provision task does not check the cancelation flag. After the task starts, it will continue until complete. Any “stop sync” issued during the process will be ignored.

Since re-provision tasks are resource intensive, you may consider scaling the database to higher performance level while re-provisioning to ensure the task completes as soon as possible. You may also consider grouping the re-prevision changes in smaller units where possible and implement one unit at a time.

De-provisioning:

Removing a database from a Sync Group will start de-provisioning, it will remove all the tracking tables, stored procedures, and triggers created in the user database as well as the metadata information on the service side.

If the de-provisioning fails for any reason, an error status on the database is displayed. You can manually perform the task by using DeprovisioningUtil.exe that is installed with the local agent.

Default path: C:\Program Files (x86)\Microsoft SQL Data Sync 2.0\bin

A sync group could be set to “out-of-date” when any change within the sync group fails to propagate to the rest of the sync group for 45 days or more. When this occurs, you need to delete the sync group, go through de-provisioning and recreate it, starting from initial synchronization.

Updated Mar 14, 2019

Version 2.0

Data sync

Azure-DB-Support-Team

Copper Contributor

Joined March 14, 2019

View Profile

Azure Database Support Blog

Follow this blog board to get notified when there's new activity

1 Comment

AndyB1978
Copper Contributor
Jul 14, 2021
You mention that reprovisioning is resource intensive. Against which database is it intensive against?
Typically we have 3 databases in our sync process. There is the AzureSQL database, the HUB database (resides in AzureSQL, but different to the data source) and also an OnPremise database. Which of these DBs would need additional resource to improve reprovisioning performance? There is also a VM which holds the Microsoft Data Sync 2.0 Application. This box is very low spec and seems to do very little throughout a sync. Is this normal?

I have the same question regarding the Initial Synchronization.
When performing rebuilds of this, I find that this is quite slow and am wondering where the bottleneck might be:
1) AzureSQL database
2) Hub Database
3) VM that holds Microsoft Data Sync 2.0 Application
4) Destination location (On Premise SQL Database)
I understand that improving HUB performance will make the GUI more performant, but once we press the SYNC button, it seems to just chug along at its own pace, with nothing looking particularly taxed, yet it still takes 14-16 hours to sync. (During which time we have zero way of monitoring progress).
Is there any particular area I could look at to improve performance on to improve the sync speed?

Thanks
Azure-DB-Support-Team

Blog Post

Best Practices for Azure SQL DataSync 2.0 tasks