Migrate Amazon RDS for SQL Server to Azure SQL Managed Instance – Part 3 (Azure Data Factory)

Published Jun 03 2022 04:59 AM 1,683 Views
Microsoft

This is the third post in the series Migrate Amazon RDS for SQL Server to Azure SQL Managed Instance, where the original overview article can be found at Migrate Amazon RDS for SQL Server to Azure SQL Managed Instance – Part 1 (Overview).

 

In the previous post, Migrate Amazon RDS for SQL Server to Azure SQL Managed Instance – Part 2 (Native Backup & Restore), we have discussed SQL Server native backup and restore option together with AZCopy tool to migrate from Amazon RDS for SQL Server to Azure SQL Managed Instance.

 

In this post, we demonstrate how to use the Azure Data Factory (ADF) to migrate data from Amazon RDS for SQL Server to Azure SQL Managed Instance.
Notice that the objective for this migration path is to transfer the data from the regular tables and that the transfer of the database objects such as users, permissions and other system objects is outside of the scope of this article. 

 

Azure Data Factory is Microsoft Azure's cloud ETL service for scale-out serverless data integration and data transformation. It offers a code-free UI for intuitive authoring and single-pane-of-glass monitoring and management. You can also lift and shift existing SQL Server Integration Services (SSIS) packages to Azure and run them with full compatibility inside Azure Data Factory.

 

Alternatively, if you are looking for a fully managed Platform-as-a-Service (PaaS) option for migrating data from AWS S3 to Azure Storage, consider Azure Data Factory (ADF), which provides these additional benefits:

  1. Azure Data Factory provides a code-free authoring experience and a rich built-in monitoring dashboard.
  2. Easily scale up the amount of horsepower to move data in a serverless manner and only pay for what you use.
  3. Use Azure Integration Runtime (IR) for moving data over the public Internet or use a self-hosted IR for moving data over AWS DirectConnect peered with Azure ExpressRoute.
  4. The ability to perform one-time historical load, as well as scheduled incremental load.
  5. Integrates with Azure Key Vault for credential management to achieve enterprise-grade security.
  6. Provides 80+ connectors out of box and native integration with all Azure data services so that you can leverage ADF for all your data integration and ETL needs across hybrid environments.

 

Prerequisites:

Before you get started, you must complete the following prerequisites:

  • Prepare and configure the Source RDS SQL Server Instance following the documentation for creating a Microsoft SQL Server DB instance and connecting to it
  • Prepare your target SQL Server Managed Instance running on Azure for migration
  • Create a SQL Server database in RDS targeted for migration
  • Setup a self-hosted integration run time inside a windows VM as SQL Server instance is located outside of Azure network, you need to set up a self-hosted integration runtime to connect to it.
  • If you don’t want data to be transferred over public Internet, you can achieve higher security by transferring data over a private peering link between AWS Direct Connect and Azure Express Route

 

Solution Architecture

NikoNeugebauer_0-1654247645824.png

 

Steps To Move Data from RDS SQL Server to Azure SQL Managed Instance using ADF

ssas

ws

  1. Create and populate data in RDS SQL Server database.
  2. Create a data factory
  3. Create linked services
    1. Create the source RDS SQL Server linked service
    2. Create the sink Azure SQL Managed Instance linked service
  4. Create datasets
    1. Create a dataset for source.
    2. Create a dataset for Sink
  5. Create Pipeline
  6. Trigger Pipeline
  7. Monitor Pipeline

 

Create and Populate data in RDS SQL Server database

Create tables and populate data in your source RDS SQL Server database that will drive the copy operation.

NikoNeugebauer_1-1654247889963.png

 

Azure services to access SQL server

Ensure that Allow Azure services and resources to access this server setting is turned ON for your server. This setting allows the Azure Data Factory service to write data to your Azure SQL Managed Instance. To verify and turn on this setting, go to your server > Security > Firewalls and virtual networks > set the Allow Azure services and resources to access this server to ON.

 

Create a data factory

Create and configure Azure Data factory by following instructions described in the Create a data factory by using the Azure portal and Azure Data Factory Studio.

NikoNeugebauer_2-1654248080750.png

Create linked service for Source RDS SQL Server

In order to read data from AWS RDS for SQL Server inside Azure Data Factory, we shall need to create new linked service:

 

  • Browse to the Manage tab in your Azure Data Factory and then select the New button on the toolbar.
  • On the New linked service page, select SQL Server, and then select Continue.
  • Fill out the required fields and be sure to choose your self-hosted integration runtime under Connect via integration runtime. Test Connection and select Create.

NikoNeugebauer_5-1654249254414.png

 

Create linked service for Sink SQL Managed Instance

A similar operation needs to be executed for creation of the linked service to transfer data to the destination - Azure SQL Managed Instance:

  • Browse to the Manage tab in your Azure Data Factory and then select the New button on the toolbar.
  • On the New linked service page, select Azure SQL Managed Instance, and then select Continue.
  • Fill out the required fields, Test Connection and select Create.

NikoNeugebauer_4-1654249198208.png

 

Create datasets for source RDS SQL database a sink for Azure SQL Managed Instance

As a next step we shall create source and sink datasets, which define the schemas for the source and destination of the data.

The input dataset will be called AmazonRdsForSqlServerTable1 and the destination dataset (sink) willl be named AzureSqlMITable1.

 

The input source dataset AmazonRdsForSqlServerTable1 refers to the linked service AmazonRDSforSQLServer1 that we have created in the previous step and that points to AWS RDS for SQL Server. The linked service specifies the connection string to connect to the RDS SQL Server database. The dataset specifies the name of the database and the table that contains the source data.


For creating it follow these steps:

  1. Select Author tab from the left pane.
  2. Select the + (plus) in the left pane, and then select Dataset.
  3. In the New Dataset window, select Amazon RDS for SQL Server, and then click Continue.
  4. In the Set properties window, under Name, enter AmazonRdsForSqlServerTable1. Under Linked service, select AmazonRDSforSQLServer1. Then click OK.

Switch to the Connection tab, select any table for Table. This table is a dummy table. You specify a query on the source dataset when creating a pipeline. The query is used to extract data from your database. Alternatively, you can click Edit check box, and enter a placeholder name [dbo.dummyName] as the table name. This placeholder name will serve will be substituted for the name of the real tables once the ADF pipeline will be parametrized.

 

 

NikoNeugebauer_6-1654249517343.png

 

The output dataset AzureSqlMITable1 refers to the linked service AzureSqlMI1. The linked service specifies the connection string to connect to the Azure SQL Managed Instance. The dataset specifies the database and the table to which the data is copied to.

 

  1. Click + (plus) in the left pane and click Dataset.
  2. In the New Dataset window, select Azure SQL Managed Instance, and then click Continue.
  3. In the Set properties window, under Name, enter [AzureSqlMITable1]. Under Linked service, select AzureSqlMI1. Then click OK.
  4. Switch to the Parameters tab, click + New, and enter [DWTableName] for the parameter name. Click + New again, and enter [DWSchema] for the parameter name. If you copy/paste this name from the page, ensure that there's no trailing space character at the end of DWTableName and DWSchema.
  5. Switch to the Connection tab,
    1. For Table, check the Edit option. Select into the first input box and click the Add dynamic content link below. In the Add Dynamic Content page, click the DWSchema under Parameters, which will automatically populate the top expression text box @dataset().DWSchema, and then click Finish.

      NikoNeugebauer_0-1654250427133.png

    2. Select into the second input box and click the Add dynamic content link below. In the Add Dynamic Content page, click the [DWTAbleName] under Parameters, which will automatically populate the top expression text box @dataset().DWTableName, and then click Finish.
    3. The tableName property of the dataset is set to the values that are passed as arguments for the [DWSchema] and [DWTableName] parameters. The ForEach activity iterates through a list of tables and passes one by one to the Copy activity.

 

Create ADF pipeline to iterate and copy tables 

 

Lets create the IterateAndCopySQLTables pipeline which will take a list of tables as a parameter. For each table in the list, it copies data from the RDS SQL Server tables in AWS to Azure SQL Managed Instance

Switch to the Source tab, and do the following steps:

  1. Select AzureSqlDatabaseDataset for Source Dataset.
  2. Select Query option for Use query.
  3. Click the Query input box -> select the Add dynamic content below -> enter the following expression for Query -> select Finish:
    SELECT * FROM [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]

NikoNeugebauer_1-1654250617193.png

 

Switch to the Sink tab, and do the following steps:

  1. Select AzureSqlDWDataset for Sink Dataset.
  2. Click the input box for the VALUE of DWTableName parameter -> select the Add dynamic content below, enter @item().TABLE_NAME expression as script, -> select Finish.
  3. Click the input box for the VALUE of DWSchema parameter -> select the Add dynamic content below, enter @item().TABLE_SCHEMA expression as script, -> select Finish.
  4. For Table option, the default setting is "None". If you don’t have tables pre-created in the sink Azure Synapse Analytics, enable Auto create table option, copy activity will then automatically create tables for you based on the source data. For details, refer to Auto create sink tables.

The pre-copy script property specifies a SQL query for the copy activity to run before writing data into SQL Managed Instance. It's invoked only once per copy run. You can use this property to clean up preloaded data.

 

Click the Pre-copy Script input box -> select the Add dynamic content below -> enter the following expression as script -> select Finish

 

IF EXISTS (SELECT * FROM [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]) TRUNCATE TABLE [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]

 

On the picture below you can see the result of configuration

NikoNeugebauer_2-1654251215873.png

 

Create the pipeline for transferring data from AWS RDS to Azure SQL Managed Instance 

In the next step we can create the pipeline which shall be named GetTableListAndTriggerCopyData and which will perform two actions:

  • Look up the RDS SQL Server Database metadata table to get the list of tables to be copied.
  • Trigger the pipeline IterateAndCopySQLTables to do the actual data copy.

The IterateAndCopySQLTables pipeline takes a list of tables as a parameter. For each table in the list, it copies data from the table in Azure SQL Managed instance.

Here are the steps to create the pipeline:

  1. In the left pane, click + (plus), and click Pipeline.
  2. In the General panel under Properties, change the name of the pipeline to GetTableListAndTriggerCopyData.
  3. In the Activities toolbox, expand General, and drag-drop Lookup activity to the pipeline designer surface, and do the following steps:
    1. Enter LookupTableList for Name.
    2. Enter Retrieve the table list from my database for Description.
  4. Switch to the Settings tab, and do the following steps:
    1. Select AmazonRdsForSqlServerTable1 for Source Dataset.
    2. Select Query for Use query.

Enter the following SQL query for Query:

 

SELECT * FROM [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]

 

NikoNeugebauer_3-1654252082767.png

Switch to the Sink tab, and do the following steps:

  1. Select AzureSqlDWDataset for Sink Dataset.
  2. Click the input box for the VALUE of DWTableName parameter -> select the Add dynamic content below, enter @item().TABLE_NAME expression as script, -> select Finish.
  3. Click the input box for the VALUE of DWSchema parameter -> select the Add dynamic content below, enter @item().TABLE_SCHEMA expression as script, -> select Finish.

For Table option, the default setting is "None". If you don’t have tables pre-created in the sink Azure Synapse Analytics, enable Auto create table option, copy activity will then automatically create tables for you based on the source data. For details, refer to Auto create sink tables.

 

The pre-copy script property specifies a SQL query for the copy activity to run before writing data into SQL Managed Instance. It's invoked only once per copy run. You can use this property to clean up preloaded data.

 

Click the Pre-copy Script input box -> select the Add dynamic content below -> enter the following expression as script -> select Finish

 

IF EXISTS (SELECT * FROM [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]) TRUNCATE TABLE [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]

 

NikoNeugebauer_4-1654252757743.png

Create the pipeline for listing the tables to be transferred 

Create the GetTableListAndTriggerCopyData pipeline which will perform two actions:

  • Look up the RDS SQL Server Database metadata table to get the list of tables to be copied.
  • Trigger the pipeline IterateAndCopySQLTables to do the actual data copy.

The IterateAndCopySQLTables pipeline takes a list of tables as a parameter. For each table in the list, it copies data from the table in Azure SQL Managed instance.

Here are the steps to create the pipeline:

  1. In the left pane, click + (plus), and click Pipeline.
  2. In the General panel under Properties, change the name of the pipeline to GetTableListAndTriggerCopyData.
  3. In the Activities toolbox, expand General, and drag-drop Lookup activity to the pipeline designer surface, and do the following steps:
    1. Enter LookupTableList for Name.
    2. Enter Retrieve the table list from my database for Description.
  4. Switch to the Settings tab, and do the following steps:
    1. Select AmazonRdsForSqlServerTable1 for Source Dataset.
    2. Select Query for Use query.
    3. Enter the following SQL query for Query:
      SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE = 'BASE TABLE';

      The INFORMATION_SCHEMA.TABLES view allows you to get information about all tables and views within a database. By default, it will show you this information for every single table and view that is in the database. You can use this query to list all the table names and schema names inside the source database. Notice that in some of the cases, such as Temporal Tables might require additional work and configurations.

    4. Clear the checkbox for the First row only field.

       

       

    NikoNeugebauer_5-1654254117201.png

  5. Drag-drop Execute Pipeline activity from the Activities toolbox to the pipeline designer surface, and set the name to TriggerCopy.
  6. To Connect the Lookup activity to the Execute Pipeline activity, drag the green box attached to the Lookup activity to the left of Execute Pipeline activity. 

NikoNeugebauer_6-1654254741402.png

 

Trigger a pipeline run

Now that the pipelines are configured, it’s time to run them.

  1. Select GetTableListAndTriggerCopyData from the available Pipelines.
  2. Select Add Trigger, then select Trigger Now and select Finish.

 

Monitor the pipeline run

  1. Switch to the Monitor tab. Click Refresh until you see runs for both the pipelines in your solution. Continue refreshing the list until you see the Succeeded status.
  2. To view activity runs associated with the GetTableListAndTriggerCopyData pipeline, click the pipeline name link for the pipeline. You should see two activity runs for this pipeline run.
  3. Connect to Sink database on Azure SQL MI and verify the migrated data.

NikoNeugebauer_7-1654254800488.png

 

 

NikoNeugebauer_8-1654254800493.png

Summary 

In this blog post, we have discussed usage of Azure Data Factory for migrating data from Amazon RDS for SQL Server Instance to Azure SQL Managed Instance.

In the next post, we will dive into using Smart Bulk copy option for the same data migration purposes.

 

1 Comment
Version history
Last update:
‎Jun 03 2022 04:59 AM
Updated by: