Design your Data pipelines in Azure Data Factory to load data into Azure Database for MySQL

Former Employee

Sep 11, 2019

One of the core value propositions for running your MySQL databases in a fully managed service like Azure Database for MySQL is richer native integrations with Azure services in the cloud ecosystem. Today, we are happy to announce that Azure Database for MySQL is now a supported sink destination in Azure Data Factory. Azure Data Factory is a fully managed data integration service that allows you to create data-driven workflows in a code free visual environment in Azure for orchestrating and automating data movement and data transformation.

Azure Data Factory supports 80+ data sources collectively across Azure, on-premises, multi-cloud and SaaS platforms orchestrating secure data transfer. Azure Database for MySQL which is running the community edition of open source MySQL is a preferred relational data store for customers looking for reliable, low cost and secure open data platform. Azure Database for MySQL as a supported target destination for data movement is of one of the top requested features from customers. The native integration of Azure Data Factory with Azure Database for MySQL unlocks many possible hybrid scenarios and multi-cloud architectures. You can now leverage Azure Data factory for secure one-time data movement or running continuous data pipelines which loads data into Azure Database for MySQL from disparate data sources running on-premises, in Azure, SaaS platforms or other cloud providers for analytics and reporting.

Copy Data into Azure Database for MySQL

In Azure Data Factory, you can use the Copy activity to copy data among data stores located on-premises and in the cloud. After you copy the data, you can use other activities to further transform and analyze it as described here. If you want to copy the data from file system or cloud-based object stores, the copy activity can also read from files in these formats:

Text
JSON
Avro
ORC
Parquet

For up-to-date information on the supported data sources and file formats, please refer to the documentation.

At a high level, to use the Copy activity in Azure Data Factory, you need to:

Create linked services for the source data store and the sink data store. Linked services are much like connection strings, which define the connection information that's needed for Azure Data Factory to connect to external resources.
Create datasets for the source and sink. Datasets define the schema, table name, or filename in linked services created in previous step, where the data can be sourced or targeted for copy activity.
Create a pipeline with the Copy activity. In the step you map the source, sink datasets created in previous step and do the column mapping to ensure there are no data conversion errors.

For latest up to date information on the Copy activity, please refer to the documentation.

You can author and run a copy activity data pipeline in Azure Data Factory using the following toolchains

In this blog post, we will walk through the Azure portal UI experience. Once you create the Azure Data Factory service via the portal using the steps mentioned in the QuickStart documentation, you will see the “Let’s get started” page as shown below. On the Let's get started page, switch to the Author tab in the left panel.

Create Linked Services

After clicking on the Author tab, you create the Linked Service by clicking on Connections and select the New button on the toolbar as mentioned in the documentation and shown below. For Azure Database for MySQL service, you need to provide the server name, port, database name, username and password for user with permissions to connect to the Azure Database for MySQL server and perform schema discovery, read and write data to the table. In this step, you will define the connections for source and target linked services.

Create Datasets

In this procedure, you create two datasets for source and target by clicking on the ‘+’ (plus) sign and selecting Datasets as per the documentation. Under Connections tab, you map it to the Azure Database for MySQL Linked Service defined in previous step by selecting it from the dropdown. Next, you identify the destination table to load data in Azure Database for MySQL server as shown below.

Create a Pipeline

In this step, you create a pipeline for data movement and transformation by clicking on ‘+’ (plus) sign and selecting Pipeline as per the documentation. In the pipeline, map the source datasets and the sink datasets created in the previous step and perform column mapping to ensure there are no data conversion errors moving from source to target as shown below.

Finally, once the pipeline is defined and created, click Validate on the toolbar above the canvas to validate pipeline settings. Confirm that the pipeline has been successfully validated. To close the validation output, select the >> (right arrow) button.

Debug and Run the Pipeline

After validation, on the pipeline toolbar above the canvas, click Debug to trigger a test run. You can see the status of the pipeline run on the Output tab of the pipeline settings at the bottom.

Once the data pipeline run is successful, you can deploy the pipeline and trigger the run manually or on a schedule. To monitor the pipeline runs, you can switch to Monitor tab on the left and use the Refresh button to refresh the list as shown below.

Getting Started

We encourage you to get started by creating a Data pipeline using one of the preferred toolchains bringing data from disparate data sources into Azure Database for MySQL as a destination target.