Database migration from Managed Instance to SQL Server with transactional replication
Introduction
When we talk about large databases with hundreds of GBs and significant number of database objects, when transactional consistency is required and migration needs to be done in an online fashion with short down time, Transactional replication is currently the most robust and advised way to perform migration from Azure SQL Managed Instance to SQL Server whether it’s hosted on IaaS or on-premises.
Transactional replication has been around for many years, and there is lots of documentation and other kind of resources where you can learn more about it. This blog post will focus on using transactional replication to migrate a database from Managed Instance to SQL Server.
Transactional replication overview
If you haven’t had experience with Transactional replication so far, here is a brief overview of this technology from database migration point of view. There will be three subjects that will participate in the replication process:
- Publisher is the source of the replication, and for us this will be Managed Instance.
- Distributor is coordinating replication, and this can be either Managed Instance or SQL Server.
- Subscriber is replication target, and this will be SQL Server, as we migrate to SQL Server.
Transactional replication is based on three SQL Agent jobs:
- Snapshot Agent (gets the initial database snapshot).
- Log Reader Agent (tracks source database changes after the snapshot is taken).
- Distribution Agent (applies the database changes to target).
Replication can be done in Push or Pull model. Push means that distributor is on the Publisher side, pushing the replication towards Subscriber. In Pull model, distributor is on the Subscriber side, pulling the replication from the Publisher. Each approach has its pros and cons. Distributor needs some resources to run, so it will have some performance impact to the side where it is deployed. Having distributor on a Publisher side means that replication is delivered continuously. If it is on the Subscriber side then the changes are delivered periodically, per schedule. In rare cases when both source and target don’t have enough resources to run the distributor, third option is available – to host distributor on a third server, and it can be either Managed Instance or SQL Server. In this case changes are also delivered periodically.
Transactional replication setup
In this section we will setup a transactional replication for the purpose of migrating database from Managed Instance to SQL Server 2019 hosted on Azure VM. Managed Instance will be both Publisher and Distributor, while SQL Server will be Subscriber. This topology will enable us to perform online migration, as changes from publisher to subscriber are delivered continuously.
Prerequisites
For replication to work, there must be a network connectivity between Managed Instance and SQL Server. In our case this is ensured by having both servers on the same Virtual Network. If that were not the case then VNet peering setup would be required.
Outbound TCP port 445 needs to be open for transactional replication, so in case you have NSG make sure this port is allowed. Allowing this port will enable Managed Instance to access Azure Storage account, which will be covered a bit later. To allow the port, go to the virtual network of your Managed Instance and identify the security group name.
Then find that Security group in the Azure portal, and in its Settings section open Outbound security rules. Click Add and add outbound security rule for port 445.
Transactional replication will require a storage account, so if you don’t have one yet, use the Azure Portal to create it. Alternatively, you can use one of your existing storage accounts. Within the Storage account, create a File Share with quota of 20 GiB.
Go to the File share Properties and copy the storage URL because this value will be needed in the following T-SQL script.
Also, go to the Settings of the Storage account, to Access keys section and copy Connection string, that will also be needed for the T-SQL script.
Once the prerequisites are met, we will setup the transactional replication by creating distribution, publication and finally a subscription.
Configure replication distribution
Storage URL and connection string we’ve already copied will be needed in the following T-SQL script.
DECLARE @distribution_db_name NVARCHAR(50) = N'distribution'
-- Assign the storage connection string you got from the Azure Portal.
DECLARE @storage_conn_string NVARCHAR(MAX) = N'DefaultEndpointsProtocol=https;AccountName=rmstoragesp01;AccountKey=asdf000000000000000000000000000000000000000000000000000000000000000000000000000000000a==;EndpointSuffix=core.windows.net'
-- Assign the storage URL you got from the Azure Portal.
DECLARE @storage_url NVARCHAR(MAX) = N'https://rmstoragesp01.file.core.windows.net/rm-transactinoal-replication-sp-01'
-- Adjust storage URL format.
DECLARE @working_dir NVARCHAR(MAX) = SUBSTRING(@storage_url, 9, LEN(@storage_url) - 8)
SET @working_dir = '\\' + REPLACE(@working_dir, '/', '\')
PRINT @working_dir
-- With following three commands you will set distributor server, distributor database name and configure publisher to use a specified distribution database.
--
EXEC sp_adddistributor @distributor = @@SERVERNAME
EXEC sp_adddistributiondb @database = @distribution_db_name
EXEC sp_adddistpublisher @publisher = @@SERVERNAME,
@distribution_db = @distribution_db_name,
@security_mode = 0,
@login = N'login_name',
@password = N'password',
@working_directory = @working_dir,
@storage_connection_string = @storage_conn_string
Once this is set you can run following three commands to verify the settings.
EXEC sp_helpdistributor
GO
EXEC sp_helpdistributiondb
GO
EXEC sp_helpdistpublisher
GO
If you want to drop any of these settings use stored procedures sp_dropdistpublisher, sp_dropdistributiondb and sp_dropdistributor, or in case you need to reset the settings entirely you can run EXEC sp_dropdistributor @no_checks = 1, @ignore_distributor = 1.
Configure replication publication
To create publication, use SSMS, connect to the Managed Instance with the private endpoint (with public endpoint publication and replication will not work). In the Object Explorer go to Replication, Local Publications and finally New Publication as on the screen shoot below.
This will open a New Publication Wizard. Click Next, select the database you want to publish and again click Next.
As on the screenshot below, for Publication Type chose Transactional publication and click Next. Then on the Articles window, choose objects you want to replicate. In this example we will chose all Tables, Stored Procedures, Views, Indexed Views and only ufnLeadingZeros from User Defined Functions. Once this is selected, click on the Article properties button.
On Article Properties, select Set Properties of All Table Articles, and then in the menu find properties Copy full text indexes and Copy XML indexes and set their values to True.
Wizard will warn about potential problems on the Article Issues window. We will ignore these and click Next. On the Filter Table Rows window we don’t add any filters, just click Next.
On Snapshot Agent windows check “Create a snapshot immediately…” and click Next. On Agent Security windows click on Security Settings to provide credentials for both Snapshot Agent and connection to the Publisher.
On the Wizard Actions window check Create the Publication and click Next.
Finally in the Complete the Wizard, provide the name for the publication and click Finish. Wizard will show progress and once the process is done you should see a Success sign with 0 Errors and 0 Warnings. Click the Close button. With this the publication is successfully created.
To check the status of the publication, refresh the Local Publications in SSMS and you will find newly created publication. From the context menu of the newly created Publication, click Launch Replication Monitor and find the publisher that has been just created. On the Agents tab you can monitor the progress of the Snapshot Agent and Log Reader Agent. Time needed for a snapshot to be created varies and depends on the size of data that’s participating in the replication. Once Snapshot Agent’s Last Action is “[100%] A snapshot of x article(s) was generated.” the publication is ready.
Configure replication subscription
Now when distribution and publication are configured, we need to create a subscription as the last piece of transactional replication setup. In the context menu of the Publication we’ve created, we are going to click on New Subscriptions. New Subscription Wizard will open. Let’s follow its steps.
On the Publication window, select the Publisher, database and the publication and click Next. On the Distribution Agent Location select option to Run all agents at the Distributor and click Next.
On the Subscriber window, click on Add Subscriber and then Add SQL Server Subscriber. Then connect to the SQL Server that will be the Subscriber. This SQL Server will be the target for the database migration. Use the server’s local IP address for this connection.
For the newly added Subscriber, for the Subscription Database chose New database option from the drop down menu, and in the New Database window, provide the Database name. This will be the name of the target database for the migration. Then click OK and then Next.
On the Distribution Agent Security window, click on the button with three dots, and in Distribution Agent Security window provide login and password for Distributor and Subscriber.
Follow through the wizard. On Synchronization Schedule leave the Agent Schedule set to Run continuously. This will enable continuous data flow from the source database to the destination and help with the online migration. On the Initialize Subscription window leave Initialize When set to Immediately. Then on the next window, Wizard Actions, leave Create the subscription(s) checked and click Next. On the Complete the Wizard click Next and wait for the Success status on the final Creating Subscription(s) step.
Once this is done, from the context menu of the newly created subscription, open the View Synchronization Status. You will see its progress and eventually all transactions will be replicated.
From the context menu of the Publication click Launch Replication Monitor.
Here in the Replication Monitor, in Agents tab you will see the progress of the Snapshot Agent and Log Reader Agent. On Snapshot agent reaches 100%, database is replicated, and ongoing transactions will be replicated as well.
Finally, connect to the SQL Server that is Subscriber and you will see that database is replicated.
With this you source database is migrated from Managed Instance to SQL Server. If the workload on the source database is still running, changes it might make will be replicated to the target database. Once you want to finalize the migration, you can set the source database to read-only mode, wait for the replication to complete (in the Replication monitor, Log Reader Agent's last action will be "No replicated transactions are available") and stop the transactional replication. This is done by disabling the corresponding SQL Agent jobs and deleting subscriber, distributor and publisher.
Transactional replication limitations and workarounds
Managed Instance documentation contains list of know errors related to transactional replication.
In case you want to know more, take a look at the official Managed Instance transactional replication documentation.
Primary key requirement
All tables that participate in transactional replication need to have primary key. If a table does not have one, you will have to create it. Alternatively, you can create materialized view over the table that's missing primary key, and replicate that materialized view. Once the data is in the target database, you can create the missing table and insert data from the replicated materialized view to the newly created table.
Conclusion
Transactional replication offers performant and robust way to migrate database from Managed Instance to SQL Server, with some limitations mentioned in this blog. For more information on the migration options, see Moving databases from Azure SQL Managed Instance to SQL Server.