Database Migration / Reverse Migration between Azure SQL (DB/MI) and SQL Server using SmartBulkCopy
Published Mar 28 2022 08:57 AM 2,036 Views
Microsoft

Introduction

In this post we want to address a common ask and scenario we experience working with multiple customers. This is around the need to migrate databases between the various Azure SQL offerings as well as reverse migrate the databases back to on-premises SQL Servers running in the customer's environments. There can be multiple reasons for undertaking such migrations such as needing to move to Azure SQL Managed Instance from an existing Azure SQL Database implementation due to requirements around certain native functionality or the requirement to prove reverse migration capabilities to bring data back on-premises for compliance or regulatory reasons. In this post, we will cover the steps you can take to accomplish such patterns using an open-source tool called SmartBulkCopy and a Microsoft tool called SqlPackage.

 

All the scripts mentioned in this blog post can be downloaded from here and are enclosed within a .zip file and all descriptions and explanations are included below for each individual script.

 

Migration Prerequisites:

All the below prerequisites must be met to successfully migrate the source database(s) to/ from Azure SQL DB, Azure SQL MI, Azure IaaS VM or SQL Server on-premise.

 

To successfully execute the included scripts the following tools need to be installed on a machine where the migration will be initiated:

  • PowerShell or PowerShell Core (7.0 or newer)
  • SqlPackage if the schema migration is to be executed via DacPac.
  • Smartbulkcopy program to automate data copy activity. SmartBulkCopy requires .NET Core 3.1 to be installed.
  • For schema migration using mssql-scripter (instead of sqlpackage.exe) there are some additional prerequisites these will be explained later in the blog.

 

Cross-Platform Migration

The sections below describe the logical steps involved in enabling a database to be migrated, or reverse migrated between any SQL Server source and target. The migration is logical which means it includes data and schema migration where native backup restore capabilities cannot be leveraged due to the fact that they may not be supported (Azure SQL DB), or the backups are incompatible with a destination (SQL MI vs. SQL Server on VM/On-prem).

Migration between the same Azure SQL Service i.e., between Azure SQL DB logical servers or Azure SQL Managed Instances are able to leverage the native backup restore feature that service supports and therefore not in scope for this blog.

 

Migrating the Database Schema

The first step starts with schema migration which is required before the data copy activity is initiated. The schema migration can be completed using SqlPackage, mssql-scripter or any similar tool. Usually, the most efficient and simplified way to perform schema migration is with SqlPackage (DacPac). However, mssql-scripter can be very useful in schema generation and deployment at a large scale where multiple databases are being moved.

 

Using SQLPackage for Schema Migration

SqlPackage must be preinstalled on the machine where the schema extraction and deployment are going to be initiated.

 

Schema Extract with SQLPackage

To extract the schema run the below command. To find out more about the SqlPackage extract parameters visit the site here.

 

Parameter name

Description

Action:Extract

Extract the database schema.

 

ssn

Source SQL Server name.

 

sdn

Source SQL Server database name

 

tf

Target file name (dacpac)

 

su

Source user. The login used to connect to the source SQL database.

sp

Source user’s password.

 

sqlpackage.exe /Action:Extract  /MaxParallelism:8  /ssn:source_sql.database.windows.net /sdn:source_db /su:XXXXX /sp:XXXXX /tf:"C:\TEMP\source_db.dacpac" 

 

Schema Deployment with SQLPackage

Before the schema is deployed an empty database must be created first on the destination server. To deploy the schema run the below command. To find out more about the SqlPackage publish parameters visit the site here.

 

Parameter name

Description

Action:Publish

Publish the database schema

 

tsn

Target SQL Server name

 

tdn

Target SQL Server database name

 

sf

Source file name (dacpac)

 

tu

Target user. The login used to connect to the destination SQL database.

tp

Target user’s password.

 

 

SqlPackage.exe /Action:Publish /MaxParallelism:8 /tsn:target_sql.database.windows.net /tdn:target_db  /tu:XXXX /tp:XXXXXX /sf:"C:\TEMP\source_db.dacpac" /p:AllowIncompatiblePlatform=true

 

 

Using mssql-scripter for Schema Migration

Mssql-scripter is useful if large numbers of databases must be scripted out. mssql-scripter can be used as good alternative to automate the schema generation and deployment in such scenarios.

 

To use the mssql-scripter the following tools must be installed:
  • Python (https://www.python.org/downloads/ )
    • While installing python select the “Add Python to PATH” option. Python must be in the PATH environment variable.
  • mssql-scripter
    • Once python is installed, open the command line and execute the following command: pip install mssql-scripter
  • Sqlcmd
  • PowerShell or PowerShell Core (7.0 or newer)
    • The Az.Sql module is required
 
Schema Extract with mssql-scripter

The following parameters are used in the script (schema_generation.ps1) and the script for this step is available here

 

Parameter name

Description

$sourceSQL

The source SQL Server.

 

$sourceUser

The user sql account to connect to the source SQL Server.

 

$sourcePasswordPlain

The password for the sourceUser account.

 

$databasename

An array/list of databases to be scripted out.

 

$fileFolder

Temporary file folder where all deployment scripts are generated.

 

$file

PowerShell file generated by the script. It will store mssql-scripter commands. Do not change the name of the file.

 

Schema Deployment with mssql-scripter

The following parameters are used in the script (schema_deployment.ps1) and the script is available here

 

Parameter name

Description

$targetSQLFqdn

The Azure SQL Server name. For Azure SQL DB or MI use the FQDN

 

$targetLogin

The Azure SQL DB user to connect to Azure SQL DB and deploy the database schema.

 

$targetPasswordPlain

The password for the target login user.

 

$databasename

An array / list of databases to be deployed.

 

$fileFolder

Temporary file folder where all deployment scripts are stored.

 

$sqlcmdFolder

The location of the sqlcmd utility e.g. C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn

 

Preparing the Target Database

The target database should be initialized i.e., an empty database must be created as the 1st step, and then the schema should be deployed. However, the data copy procedure requires that all foreign key constraints are dropped prior to initiating the copy activity. Additionally, it’s recommended to drop all secondary indexes and load the data into a heap or clustered index structures. The attached SQL Scripts will ease and simplify the clean-up process and will recreate the dropped objects after the data migration has been completed as a final step

 

PLEASE NOTE:

The scripts to create the objects mentioned below can be found here

The following stored procedures must be deployed in the target database.

Do not deploy the following stored procedures in the source database.

 

 

  • dbo.SmartBulkCopyDropAndCreateFKs procedure is used to drop and recreate all foreign key constraints
  • dbo.SmartBulkCopyDropSecondaryIndexes procedure is used to drop all secondary indexes
  • dbo.SmartBulkCopyCreateSecondaryIndexes procedure is used to recreate all secondary indexes
  • dbo.SmartBulkCopyMigrationMaintenance procedure is used to orchestrate the process. It’s executed before and after the copy activity.

Once all 4 procedures are deployed/created in the target database execute the following command in the target database if the secondary indexes should be dropped:

 

exec dbo.SmartBulkCopyMigrationMaintenance @execType = N'PREMIG', @secondaryIndexes=1

 

 

If the secondary (non-clustered) indexes should be untouched, please execute the following command in the target database. The secondary indexes may slow down the database copy activity but for very large tables it may reduce the overall migration time.

 

exec dbo.SmartBulkCopyMigrationMaintenance @execType = N'PREMIG', @secondaryIndexes=0

 

 

Please note the following tables are created by the SmartBulkCopyMigrationMaintenance stored procedure:

 

  • dbo.SmartBulkCopyCmds table stores all commands to create and drop objects
  • dbo.SmartBulkCopyExecutedCmds stores all commands that have been successfully executed. Initially, the table is empty and will be populated once the pre and post-migration procedure has completed.
  • dbo.SmartBulkCopyErrorCmds stores the errored commands that could not be completed. Initially, the table is empty.

 

Before proceeding verify if there is no row in the dbo.SmartBulkCopyErrorCmds table.  If any foreign key or secondary index could not be removed the corresponding command will be recorded in the table.

You may rerun the procedure (exec dbo.SmartBulkCopyMigrationMaintenance @execType = N'PREMIG') again and only errored commands will be re-executed. Successfully executed commands are omitted in the subsequent executions.

 

Migrating the Data

As the migration is logical i.e., it involves the data copy activity, the downtime starts when the data copy activity is initiated. It is highly recommended that the source database is set Read Only during the copy activity to avoid any data discrepancies during the data movement.

 

This step leverages SmartBulkCopy program to automate the migration. Follow the steps to initiate data migration:

  • Download or clone the SmartBulkCopy repository.  Make sure you have .NET Core 3.1 installed.
  • Edit the smartbulkcopy.config file located in smartbulkcopy-master\client folder. (If the file does not exist make a copy of smartbulkcopy-master\client\smartbulkcopy.config.template file and rename the copy).  Edit the source and destination connection strings and make sure the safe-check is set to none.
  • If the secondary (non-clustered) indexes should be untouched please make sure the secondary-indexes option is set to false.

An example of the config file can be seen here:

{  source": {

    "connection-string": "Server=source_server.XXXX; Initial Catalog=your_database; Connection Timeout=90; User ID=XXXXX; Password=XXXXXXX"

  },

  "destination": {

    "connection-string": "Server=destination_server.XXXX; Initial Catalog=your_database; Connection Timeout=90; User ID=XXXXX; Password=XXXXX"

  },

  "tables": {

    "include": [ "*" ],

    "exclude": []

  },

    "options": {

        "tasks": 24,

        "logical-partitions": "auto",

        "batch-size": 1000000,

        "truncate-tables": false,

        "safe-check": "none",

        "stop-if": {

            "secondary-indexes": false,

            "temporal-table": true

        },

        "retry-connection": {

            "delay-increment": 10,

            "max-attempt": 5

        }

    }

}

 

  • Open the command prompt and go to the client folder (smartbulkcopy-master\client)
  • Execute dotnet run to start the data copy activity.
    DesFitz_2-1648027017603.png
  • Once the copy activity is completed SmartBulkCopy reports the total time and status.DesFitz_3-1648027030313.png
  • The last step would be to recreate the secondary indexes and foreign keys dropped earlier and this is described in the next section.

 

Recreate the dropped objects:

After the data copy activity is completed, the last remaining part is to recreate all objects (secondary indexes and foreign keys) dropped earlier. To do so execute the following command in the target database:

exec dbo.SmartBulkCopyMigrationMaintenance @execType = N'POSTMIG'

 

Depending on the data size, number of indexes and target database performance tier the procedure may take some time to complete.   

Once the post migration procedure has been finished it would be recommended to verify if all commands completed successfully or if there was any error command recorded in the dbo.SmartBulkCopyErrorCmds table.

If there is anything recorded in the dbo.SmartBulkCopyErrorCmds  it would be recommended to rerun the stored procedure (exec dbo.SmartBulkCopyMigrationMaintenance @execType = N'POSTMIG') again and it will try to (re)create the errored objects only.

 

Post Migration Validation

The post migration validation step ensures that all the data has been migrated and the source and target schema are the same. The basic validation purely relies on the row and index count and should be sufficient if access to the source database is restricted during the migration and it’s guaranteed no row has been updated.

The thorough validation does more advanced validation but is also more time consuming.

 

Basic migration validation

This PowerShell script (available here) does a basic comparison between the source and target database. The following properties are compared:

  • Number of rows per table
  • Number of indexes and unique indexes per table
  • Number of Foreign Keys per table

The script highlights the table in red if any of the properties (rows count, index count, unique index count or foreign keys count) does not match the source and target.

 

The following parameters are used in the script (BasicValidation.ps1) and it can be downloaded here.

 

Parameter name

Description

$sourceSqlServer

The source SQL Server.

 

$sourceDatabaseName

The source SQL Server database

 

$sourceUser

Source user. The login used to connect to the source SQL database.

$sourcePw

The password for the SourceUser account.

 

$targetSqlServer

The target SQL Server.

 

$targetDatabaseName

The target SQL Server database

 

$targetUser

Target user. The login used to connect to the target SQL database.

$targetPw

The password for the TargetUser account.

 

 

Thorough migration validation

The Database Compare application is a Windows console application, written in C#, that allow comparison of SQL Server to SQL Server databases or Oracle, Db2, MySQL, PostgreSQL, Teradata or Netezza databases to SQL Server. For each table processed, the application builds SQL select statements for both source and target platforms that combine the columns in a row of a table into a single string, which the query then hashes using an MD5 hash function. The MD5 hash algorithm was chosen because it is faster than newer hashes, but still has an excellent ability to detect differences in strings.

 

For detailed information on the tool and use cases as well as download link please see here

 

Best Practice Recommendations:

The time need to complete the migration depends on source and target SQL Server / database resources as well as the network throughput. It is recommended to test the migration using the database copy to estimate the time needed as well as ensure the migration can be completed without any problems.

 

The following generic recommendations can be used to ensure optimal migration performance:

  • Create an Azure VM with accelerated networking and enough vCores and memory to run the SmartBulkCopy program. Co-locate the machine with the source and/or target SQL Server.
    • If Azure SQL DB or Azure SQL Managed Instance is the source or target (internally) we can identify the availability zone placement and co-locate the VM for lowest network latency.
  • Scale up the target and source SQL Server / database to ensure high log throughput as well as enough computer processing power are available. Please note the postmigration activity is also very resource intensive as all non-clustered indexes must be built. Higher number of vCores can parallel the index build.
    • Gen 5 Business Critical with 8 vCore and more offers 96 MBps log throughput.
  • For Azure SQL on VM / On-premises SQL Server - Persisted Version Store can be enabled to allow log truncation during the post migration activity.
  • If any source table has a clustered columnstore indexes use the batch size of 1048576 to efficiently load the data into compressed segments. "batch-size": 1048576,
  • Verify there are no tables where the lock escalation is disabled. SmartBulkCopy transfers huge amount of data and disabling lock escalation my put unnecessary pressure on the memory allocations, shrink the Buffer Pool and slow down the data read activity.

SELECT * FROM sys.tables

WHERE lock_escalation_desc <> N'TABLE'OR  lock_escalation_desc <> 'AUTO'

 

Final Thoughts

We hope that this post has helped you and described scenarios and options for using the solution to migrate your databases successfully. If you have feedback or suggestions for improving this data migration asset, please contact the Data Platform Engineering Team. Thanks for your support!

Co-Authors
Version history
Last update:
‎Mar 29 2022 02:41 AM
Updated by: