Azure Data Factory CI/CD with GitHub Actions
Published Mar 15 2023 06:12 AM 10.5K Views
Microsoft

Azure Data Factory allows connecting to a Git repository for source control, partial saves, better collaboration among data engineers and better CI/CD. As of this writing, Azure Repos and GitHub are supported. To enable automated CI/CD, we can use Azure Pipelines or GitHub Actions. 

In this blog post, we will implement CI/CD with GitHub Actions. This will be done using workflows. A workflow is defined by a YAML (.yml) file that contains the various steps and parameters that make up the workflow. 

The workflow will leverage the automated publishing capability of ADF. As well as the Azure Data Factory Deploy Action from the GitHub Marketplace which under the hood uses the pre- and post-deployment script 

We will perform the following steps: 

- Create a user-assigned managed identity and configure the federation, 

- Configure the GitHub secrets, 

- Create the workflow, 

- Monitor the workflow execution. 

  

Requirements: 

- Azure Subscription. If you don't have one, create a free Azure account before you begin. 

- Azure Data Factory instance. If you don't have an existing Data Factory, follow this tutorial to create one. 

- GitHub repository integration set up. If you don't yet have a GitHub repository connected to your development Data Factory, follow the steps here to set it up.  
 

Create a user-assigned managed identity and configure the federation 

You will need credentials that will authenticate and authorize GitHub Actions to deploy your ARM template to the target Data Factory. We will leverage a user-assigned managed identity (UAMI) with workload identity federation. Using workload identity federation allows you to access Azure Active Directory (Azure AD) protected resources without needing to manage secrets. In this scenario, GitHub Actions will be able to access the Azure resource group and deploy the target ADF instance. 

  1. In Azure Portal, browse to Managed Identities and click Create
     OlgaMolocenco_0-1678973791093.png
  2. Fill in the Resource Group, Region and Name. And hit Create. 
  3. Once the UAMI is created, browse to the Overview page and take a note of the Subscription ID and Client ID. We will use it in the next section.
     OlgaMolocenco_1-1678973791094.png
  4. Navigate to the Federated Credentials -> Add Credential. Choose the Federated credential scenario GitHub Actions deploying Azure resources 
  5. Fill in the Organization, repository and entity names. The subject identifier will be composed of these. Give your federated credentials a name and click Save. 
    OlgaMolocenco_2-1678973791095.png
  6. Navigate to Azure Active Directory Overview page and take a note of the tenant ID. 
  7. Browse to the Resource Group containing the target ADF instance and assign the UAMI the Data Factory Contributor role. 

 
Configure the GitHub secrets 

You need to provide your application's Client ID, Tenant ID and Subscription ID to the login action. These values can either be provided directly in the workflow or can be stored in GitHub secrets and referenced in your workflow. Saving the values as GitHub secrets is the more secure option. 

  1. Open your GitHub repository and go to Settings.
    OlgaMolocenco_3-1678973791097.png
  2. Select Security > Secrets and variables > Actions. 
    OlgaMolocenco_4-1678973791097.png
  3. Create secrets for AZURE_CLIENT_ID, AZURE_TENANT_ID, and AZURE_SUBSCRIPTION_ID. Use these values from your Azure Active Directory application for your GitHub secrets: 

GitHub Secret 

Azure Active Directory Application 

AZURE_CLIENT_ID 

Application (client) ID 

AZURE_TENANT_ID 

Directory (tenant) ID 

AZURE_SUBSCRIPTION_ID 

Subscription ID 

      4. Save each secret by selecting Add secret. 

Create the workflow that deploys the ADF ARM template 

At this point, you must have a Data Factory instance with git integration set up. If this is not the case, please follow the links in the Requirements section. 

The workflow is composed of two jobs: 

  • A build job which leverages the npm package @microsoft/azure-data-factory-utilities to (1) validate all the Data Factory resources in the repository. You'll get the same validation errors as when "Validate All" is selected in ADF Studio. And (2) export the ARM template that’ll be later used to deploy to the QA or Staging environment. 
  • A release job which takes the exported ARM template artifact and deploys it to the higher environment ADF instance. 
  1.  Navigate to the repository connected to your ADF, under your root folder (ADFroot in the below example) create a build folder where you will store the package.json file: 
    {
        "scripts":{
            "build":"node node_modules/@microsoft/azure-data-factory-utilities/lib/index"
        },
        "dependencies":{
            "@microsoft/azure-data-factory-utilities":"^1.0.0"
        }
    }​
    Here is how this should look like: 
    OlgaMolocenco_5-1678973791099.png
    And here is the Git repository setup from the ADF Studio for reference:
     
    OlgaMolocenco_6-1678973791101.png
  2. Navigate to the Actions tab -> New workflow
     OlgaMolocenco_7-1678973791102.png
  3. Paste the workflow YAML attached to this blog. 
  4. Let’s walk together through the parameters you need to supply. These are numbered and comments describe what each expects. For the build job, there are four places you'll need to provide parameters in. These are numbered for your convinience:
     
      
    on:
      push:
        branches:
        - main
    
    permissions:
          id-token: write
          contents: read
    
    jobs:
      build:
        runs-on: ubuntu-latest
        steps:
    
        - uses: actions/checkout@v3
    # Installs Node and the npm packages saved in your package.json file in the build
        - name: Setup Node.js environment
          uses: actions/setup-node@v3.4.1
          with:
            node-version: 14.x
            
        - name: install ADF Utilities package
          run: npm install
          working-directory: ${{github.workspace}}/ADFroot/build  # (1) provide the folder location of the package.json file
            
    # Validates all of the Data Factory resources in the repository. You'll get the same validation errors as when "Validate All" is selected.
        - name: Validate
          run: npm run build validate ${{github.workspace}}/ADFroot/ /subscriptions/<subID>/resourceGroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<ADFname> # (2) The validate command needs the root folder location of your repository where all the objects are stored. And the 2nd parameter is the resourceID of the ADF instance 
          working-directory: ${{github.workspace}}/ADFroot/build
     
    
        - name: Validate and Generate ARM template
          run: npm run build export ${{github.workspace}}/ADFroot/ /subscriptions/<subID>/resourceGroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<ADFname> "ExportedArmTemplate"  # (3) The build command, as validate, needs the root folder location of your repository where all the objects are stored. And the 2nd parameter is the resourceID of the ADF instance. The 3rd parameter is the exported ARM template artifact name 
          working-directory: ${{github.workspace}}/ADFroot/build
     
    # In order to leverage the artifact in another job, we need to upload it with the upload action 
        - name: upload artifact
          uses: actions/upload-artifact@v3
          with:
            name: ExportedArmTemplate # (4) use the same artifact name you used in the previous export step
            path: ${{github.workspace}}/ADFroot/build/ExportedArmTemplate
    Tip: Use the same artifact name in the Export, Upload and Download actions. 
    More details about the validate and export commands can be found here. 
  5. In the Release step, there are the next six numbered parameters you'll need to supply: 
      release:
        needs: build
        runs-on: ubuntu-latest
        steps:
        
     # we 1st download the previously uploaded artifact so we can leverage it later in the release job     
          - name: Download a Build Artifact
            uses: actions/download-artifact@v3.0.2
            with:
              name: ExportedArmTemplate # (5) Artifact name 
    
    
          - name: Login via Az module
            uses: azure/login@v1
            with:
              client-id: ${{ secrets.AZURE_CLIENT_ID }}
              tenant-id: ${{ secrets.AZURE_TENANT_ID }}
              subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
              enable-AzPSSession: true 
    
          - name: data-factory-deploy
            uses: Azure/data-factory-deploy-action@v1.2.0
            with:
              resourceGroupName: # (6) your target ADF resource group name
              dataFactoryName: # (7) your target ADF name
              armTemplateFile: # (8) ARM template file name ARMTemplateForFactory.json
              armTemplateParametersFile: # (9) ARM template parameters file name ARMTemplateParametersForFactory.json
              additionalParameters: # (10) Parameters which will be replaced in the ARM template. Expected format 'key1=value key2=value keyN=value'. At the minimum here you should provide the target ADF name parameter. Check the ARMTemplateParametersForFactory.json file for all the parameters that are expected in your scenario
             
              # skipAzModuleInstallation:  # Parameters which skip the Az module installation. Optional, default is false.​

For more details about the Azure Data Factory Deploy Action, please check the GitHub Marketplace listing. 

 

Monitor the workflow execution 

Now, let’s test the setup by making some changes in the development ADF instance. Create a feature branch where you make the changes, and then make a pull request to main. This should trigger the workflow to execute. 

  1. To check it, browse to the repository -> Actions -> and identify your workflow 
      OlgaMolocenco_10-1678973791111.png
  2. You can further drill down into each run, see the jobs composing it and their statuses and duration, as well as the Artifact created by the run. In our scenario, this is the ARM template created in the build job.  
    OlgaMolocenco_11-1678973791112.png
  3. You can further drill down by navigating to a job and its steps. 
    OlgaMolocenco_12-1678973791114.png

Stay tuned for more tutorials. 

9 Comments
Version history
Last update:
‎Jun 12 2023 01:51 AM
Updated by: