Replication is a set of technologies for copying and distributing data and database objects from one database to another and then synchronizing between databases to maintain consistency. Transactional replication is typically used in server-to-server scenarios that require high throughput, including improving scalability and availability, data warehousing and reporting, integrating data from multiple sites, integrating heterogeneous data, and offloading batch processing.
SQL Replication can be used as a viable tool for some migration scenarios, especially when there is a need to move transactions from the source database to the target continuously. There are heterogeneous sources involved, and the target database must accept updates while data is still migrating. This blog will discuss how to utilize SQL Server Transactional replication to migrate an extensive (5-10 TB) database to the cloud over a standard internet link from an On-Premises source location to the Azure cloud.
SQL Replication is a perfect use case for specific migration scenarios involving Azure SQL Database and Azure SQL Database Hyperscale as targets when the source database has the correct compatibility requirements for Transactional Replication. The technique presented here optimizes Snapshot delivery, helps minimize cut-off time, increases decoupling of replication processes, and leverages the ability to replicate stored procedure execution during the migration process to reduce data transfer requirements.
This blog's proposed solution decouples the replication agents' execution, allowing us to generate, compact the snapshot folder, manually upload the snapshot files to the cloud, remotely applying the database snapshot, and then resume the normal replication process in a controlled fashion.
This way, we can apply the snapshot from an optimal network location, allowing for the best throughput possible to be achieved, reducing the required time for snapshot delivery.
Transactional replication will then maintain the target database in close sync with the source database, allowing reduced cut-off requirements.
Visual description of the overall process:
Required network connectivity
Tables published for transactional replication must have a primary key. If a table is in a transactional replication publication, you cannot disable any indexes associated with primary key columns. This document will describe a workaround for tables without primary keys.
The source database is accessible to users during the creation of the publication and the initial snapshot. However, it is advisable to create publications during periods of lower activity on the Publisher.
The snapshot files require significant storage space that can, in some instances, be larger than the original database size.
Especially for large data transfers over the internet, it is recommended to have stable and faster internet connectivity.
Caution about the source database's transaction log size as the log file grows until we start the Distribution Agent and complete applying the snapshot on the target. Logfile growth depends on factors such as how busy and how extensive the database is.
We will utilize a customized Transaction Replication strategy that uses a remote snapshot folder and manually controls snapshot creation and delivery.
After we manually initialize the subscription, we will resume the continuous replication process, including replicating the execution of stored procedures.
The process includes generating the Transactional Replication Snapshot during off-peak hours to a local snapshot folder on-premises, then compacting and uploading the snapshot to the cloud.
We will then execute the distribution agent to deliver the snapshot from the virtual machine on the Azure Cloud.
Moving the snapshot data manually to the cloud and executing it from a virtual machine in the cloud allows us to optimize the upload process by compacting and splitting the snapshot folder contents as needed. Besides, applying the snapshot from an optimal network stance will improve throughput and reduce initialization duration.
After the initialization concludes via this customized snapshot delivery process, we resume normal replication from the on-premises Publisher/Distributor. We are free to delete the Azure Virtual Machine and Storage Account.
If you migrate from a source already in the Azure Cloud to a target in the same Azure region, you do not need to move the snapshot manually. Just skip the steps to move the snapshot and run the distributor agent on a virtual machine.
Additional optimization agent parameter settings should optimize subscriber initialization and continuous synchronization performance. Plus, leveraging the replication of stored procedures should minimize network bandwidth usage.
Local on-premises Resources:
Publisher = On-premises SQL Server where source database resides.
Distributor = On-premises remote SQL Server distributor to avoid increasing load on the Publisher.
Subscriber = This is target Azure SQL Database.
Azure SQL VM = Azure Virtual Machine with SQL Server that is the same version as the SQL Server Distributor. We will only use the File System and execute the SQL Server Distrib.exe replication agent program from this virtual machine. This virtual machine should exist in the same datacenter and as close to the target database as possible.
Azure Storage = Azure storage in the same region as our Azure Virtual Machine.
Choose a Windows version that supports Azure File Shares and SMB 3.0 and is compatible with your SQL Server Version.
Check supported versions for your Distributor: Azure SQL Server replication to Azure SQL Database - Azure SQL Database | Microsoft Docs
Select a local folder on the Distributor Server to serve as the Snapshot folder destination: Configure Distribution - SQL Server | Microsoft Docs
Select the option to replicate stored procedure execution
Set the article option to replicate the stored procedure execution.
Do not check the box to execute the Snapshot Agent on the next screen.
-Subscriber [<Your-Server>.DATABASE.WINDOWS.NET] -SubscriberDB [<Your-Target-Database>] -Publisher [<Your-Publisher-Server>] -Distributor [<Your-Distributor-Server] -DistributorSecurityMode 1 -Publication [<Your-Publication>] -PublisherDB [<Your-Source-Database>] -Continuous
Alternatively, you can use SQL Server Standard Authentication to connect to the distributor, requiring only port 1433 connectivity between Azure and On-Premises
CD C:\Program Files\Microsoft SQL Server\140\COM
Distrib.exe -Subscriber [<Your-Server>.DATABASE.WINDOWS.NET] -SubscriberDB [<Your-Target-Database>] -SubscriberSecurityMode 0 –SubscriberLogin <Your-Target-Database-user> –SubscriberPassword <Your-Target-Database-password> -Publisher [<Your-Publisher-Server>] -Distributor [<Your-Distributor-Server] -DistributorSecurityMode 1 -Publication [<Your-Publication>] -PublisherDB [<Your-Source-Database>] –SubscriptionStreams 8 -CommitBatchSize 150 -AltSnapshotFolder <Your-Alternate-Snapshot-Folder- Location>
Remove -Continuous so the agent stops after the snapshot is delivered
SQL Transaction replication requires that tables must have a primary key to be included in as articles in a transactional replication setup. However, it can also replicate views, indexed views, stored procedures, user-defined functions, and stored procedure execution. As we cannot replicate tables without primary keys, we will create indexed views and publish these instead.
The solution will be:
If you have feedback or suggestions for improving this data migration asset, please contact the Data Platform Engineering Team. Thanks for your support!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.