Part 6: Introducing Deployment Stacks to Azure Data Factory

Microsoft

Jun 11, 2024

Introduction

This is part 6 on our series on Azure Data Factory CI/CD. This section will cover how to incorporate Azure Deployment Stacks into your Azure DevOps Pipelines.

Part 1

Architecture and Scenario
Creating resources in Azure
Create Azure Storage Containers
Create Azure Key Vaults
Create Azure Data Factory: With Key Vault Access

Part 2

Configure Azure Data Factory Source Control
Construct Azure Data Factory Data Pipeline
Publishing Concept for Azure Data Factory
Configure Deployed Azure Resources.

Part 3

The YAML Pipeline Structure
The Publish Process
ARM Template Parameterization
ADF ARM Template Deployment

Part 4

How to use Azure DevOps Pipeline Templates

Part 5

How to Deploy Linked Templates for Azure Data Factory

What are Deployment Stacks?

As per MS Learn documentation a Deployment Stack is:

An Azure deployment stack is a resource that enables you to manage a group of Azure resources as a single, cohesive unit. When you submit a Bicep file or an ARM JSON template to a deployment stack, it defines the resources that the stack manage. If a resource previously included in the template is removed, it will either be detached or deleted based on the specified actionOnUnmanage behavior of the deployment stack. Access to the deployment stack can be restricted using Azure role-based access control (Azure RBAC), similar to other Azure resources.

The TL/DR summary is that deployment stacks are an Azure resource that tracks what has been deployed as part of an ARM deployment. This will allow us to destroy resources that are no longer part of the deployment.

How Does This Impact Data Factory?

When are editing and building pipelines in our Data Factory there is often the need to remove old Datasets, Linked Services, and/or Pipelines. If you are familiar with the ADF deployment process, then you are aware that we are deploying ARM templates in incremental mode.

This means that anything we remove from the Data Factory such as a Linked Services will still be out in the Data Factory's upper environments which we shouldn't have portal access to. This is a concern as it can be seen as a security threat as it violates Least Privilege Access principles as the Data Factory could have resources it no longer needs access to, specifically if leveraging things like connection strings. Additionally, there is a risk that pipelines that are no longer maintained could be triggered on accident in upper environments.

Thus, if we had a way to remove resources which are no longer defined in our collaboration branch we should!

How to Implement?

So, Deployment Stacks may sound great? However, how hard are they to update in our pipelines? Well, if you've been following to this point and leveraging YAML Templates, not hard at all! If your pipelines aren't leveraging YAML Templates, that's alright as the process isn't all that complicated.

First, we have to understand that to implement Deployment Stacks it's really just a different deployment command. In our previous posts we leveraged AzureResourceManagerTemplateDeployment@3. For Deployment Stacks there is no ADO task available so we will leverage the Azure CLI. This can also be done with PowerShell.

By changing the deployment command, we will be telling the Azure Resource Manager to deploy our resources like we were doing before AND create a deployment stack to track them.

This will be achieved by switching our AzureResourceManagerTemplateDeploymentTask with an Azure CLI task for one that will execute the `az stack group create` command. Here is what the expanded full task would look like in a single environment with just the minimum necessary.

          - task: AzureCLI@2
            displayName: create deployment stack
            inputs:
              azureSubscription: AzureDevServiceConnection
              scriptType: 'pscore'
              scriptLocation: 'inlineScript'
              inlineScript: >
                az stack group create --name "DeploymentStackResourceName" --action-onunmanage deleteAll `
                  --deny-settings-mode denyDelete --resource-group "ResourceGroupName `
                  --template-file ARMTemplateForFactory.json --parameters "ParameterFile" 
                  --yes

If wanting to follow along with a template, please check out this task template I have created on my YAML Template Repository. Furthermore, building on our article leveraging deployment templates across environments I have updated our adf deployment job template to now deploy via stacks.

One note I will call out here....have no fear the `create` command does effectively an upsert so will create the stack if it doesn't exist and update if it already does. Thus we can keep the command as `create`.

End Result

To save the details here I clone the existing pipeline 'pl_copy_data' and renamed it 'delete-me'. This was on purpose as I wanted to see if it would delete the underlining LinkedServices or just the pipeline. To outline the steps up to this point:

Create a new pipeline from 'pl_copy_data' called 'delete-me'
Deployed ADF w/ the new pipeline
Removed the 'delete-me' pipeline from my git backed ADF instance
Redeployed my ADF instance

After these steps I now see the following under the Resource Group blade-Deployment Stacks:

We can see now that it shows the /factories/pipelines type resource called 'delete-me' is showing in a deleted state. Just to confirm I launched the ADF instance and looked at what pipelines are available to it:

This confirms that the delete-me pipeline has been successfully removed and in addition the pl_copy_data and it's Linked Services still are intact.

Conclusion

By introducing Deployment Stacks into our ADF CI/CD pipelines we now have a way to remove resources that are no longer being leveraged by the Data Factory automatically via existing CI/CD processes. This is a big step to cleaning up and securing ones Azure Data Factory Environment.

Please be sure to check out any of the blogs in our Unlock the Power of Azure Data Factory: A Guide to Boosting Your Data Ingestion Process Series and our series on YAML Pipelines as well TheYAMLPipelineOne on GitHub for additional YAML Pipeline references.

Updated Jun 11, 2024

Version 1.0

Microsoft

Joined August 11, 2022

View Profile