Azure Data Factory allows connecting to a Git repository for source control, partial saves, better collaboration among data engineers and better CI/CD. As of this writing, Azure Repos and GitHub are supported. To enable automated CI/CD, we can use Azure Pipelines or GitHub Actions.
In this blog post, we will implement CI/CD with GitHub Actions. This will be done using workflows. A workflow is defined by a YAML (.yml) file that contains the various steps and parameters that make up the workflow.
The workflow will leverage the automated publishing capability of ADF. As well as the Azure Data Factory Deploy Action from the GitHub Marketplace which under the hood uses the pre- and post-deployment script.
We will perform the following steps:
- Create a user-assigned managed identity and configure the federation,
- Configure the GitHub secrets,
- Create the workflow,
- Monitor the workflow execution.
Requirements:
- Azure Subscription. If you don't have one, create a free Azure account before you begin.
- Azure Data Factory instance. If you don't have an existing Data Factory, follow this tutorial to create one.
- GitHub repository integration set up. If you don't yet have a GitHub repository connected to your development Data Factory, follow the steps here to set it up.
You will need credentials that will authenticate and authorize GitHub Actions to deploy your ARM template to the target Data Factory. We will leverage a user-assigned managed identity (UAMI) with workload identity federation. Using workload identity federation allows you to access Azure Active Directory (Azure AD) protected resources without needing to manage secrets. In this scenario, GitHub Actions will be able to access the Azure resource group and deploy the target ADF instance.
You need to provide your application's Client ID, Tenant ID and Subscription ID to the login action. These values can either be provided directly in the workflow or can be stored in GitHub secrets and referenced in your workflow. Saving the values as GitHub secrets is the more secure option.
GitHub Secret |
Azure Active Directory Application |
AZURE_CLIENT_ID |
Application (client) ID |
AZURE_TENANT_ID |
Directory (tenant) ID |
AZURE_SUBSCRIPTION_ID |
Subscription ID |
4. Save each secret by selecting Add secret.
At this point, you must have a Data Factory instance with git integration set up. If this is not the case, please follow the links in the Requirements section.
The workflow is composed of two jobs:
{
"scripts":{
"build":"node node_modules/@microsoft/azure-data-factory-utilities/lib/index"
},
"dependencies":{
"@microsoft/azure-data-factory-utilities":"^1.0.0"
}
}
Here is how this should look like: For more details about the Azure Data Factory Deploy Action, please check the GitHub Marketplace listing.
Now, let’s test the setup by making some changes in the development ADF instance. Create a feature branch where you make the changes, and then make a pull request to main. This should trigger the workflow to execute.
Stay tuned for more tutorials.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.