Integrating Terraform and Azure DevOps to manage Azure Databricks
Published May 02 2022 09:31 AM 5,398 Views

Continuous integration and continuous delivery (CI/CD) culture started to get popular, and it brought the challenge of having everything automatized, aiming to make processes easier and maintainable for everyone.


One of the most valuable aspects of CI/CD is the integration of the Infrastructure as Code (IaC) concept, with IaC we can version our infrastructure, save money, creating new environments in minutes, among many more benefits. I won't go deeper about IaC, but if you want to learn further visit: The benefits of Infrastructure as Code 


IaC can also bring some challenges when creating resources needed for the projects. This is mostly due to creating all the scripts for the infrastructure is a task that is usually assigned to the infrastructure engineers, and it happens that we can't have the opportunity to be helped for any reason.


As a Data Engineer, I would like to help you understand the CI/CD process with a hands-on. You'll learn how to create Azure Databricks through Terraform and Azure DevOps, whether you are creating projects by yourself or supporting your Infrastructure Team.


In this article, you´ll learn how to integrate Azure Databricks with Terraform and Azure DevOps and the main reason is just because in this moment I've had some difficulties getting the information with these 3 technologies together.


First of all, you'll need some prerequisites 


  • Azure Subscription
  • Azure Resource Group (you can use an existing one)
  • Azure DevOps account
  • Azure Storage Account with a container named "tfstate"
  • Visual Studio Code (it's up to you)

So, let's start and have some fun


Please, go ahead and download or clone this GitHub repository  databrick-tf-ado and get demo-start branch.

In the folder you'll see a file named and 2 more files in the folder modules/databricks-workspace




It should be noted that this example is a basic one, so you can find more information of all the features for databricks in this link: 


Now, go to the file in the root folder and find line 8 where the declaration of azurerm starts




  backend "azurerm" {
    resource_group_name  = "demodb-rg"
    storage_account_name = "demodbtfstate"
    container_name       = "tfstate"
    key                  = "dev.terraform.tfstate"




there you need to change the value of resource_group_name and storage_account_name for the values of you subscription, you can find those values in your Azure Portal, they need to be already created.





In file inside root folder there's a reference to a module called "databricks-workspace", now in that folder you can see 2 more files and contains the definition to create a databricks workspace, a cluster, a scope, a secret and a notebook, in the format that terraform requires and contains the information of the values that could change depending on the environment. 


Now that you changed the values mentioned above into a GitHub or DevOps repository if you need assistance for that visit these pages: GitHub or DevOps.


At this moment we have our github or devops repository with the names that we require configured, so let´s create our pipeline to deploy our databricks environment into our Azure subscription.


First go to your azure subscription and check that you don't have a databricks called demodb-workspace





You'll need to install an extension so DevOps can use terraform commands so go to Terraform Extension.


Once is installed in your project in Azure DevOps click on Pipelines-Release and Create "new pipeline", it appears the option by creating the pipeline with YAML or with the Editor, I'll choose the Editor so we can see it clearer.





In Add an Artifact in the Artifact section of the pipeline select your source type (provider where you uploaded your repository) and fill all the required information, like the image below and click "Add"





Then click on Add stage in Stages section and choose empty Job and name the stage as "DEV"




After that click on Jobs below the name of the stage



In the Agent job, press the "+" button and search for "terraform" select "Terraform tool installer"



Leave the default information


Then Add another 3 tasks of "Terraform" task




Name the second task after Installer as "Init" and fill the information required like the image:





For all these 3 tasks set the information of your subscription, resource group, storage account and container, and there's also a value labeled key, there you have to set "dev.terraform.tfstate" is a key that terraform uses to keep tracking of your Infrastructure changes.




Name next task as "Plan"




Next task "Apply"




Now change the name of your pipeline and save it




And we only need to create a Release to test it


You can monitor the progress





When it finished, if everything was good you'll see your pipeline as successful 




Lastly let´s confirm in the azure portal that everything is created correctly



Then login into your workspace so you can check that the cluster, the scope, the secret, and the notebook are created and working correctly.





With that you can easily maintain your environments safe from the changes that contributors can do, only one way to accept modifications into your infrastructure.


Let us know any comments or questions.









Version history
Last update:
‎May 02 2022 12:38 PM
Updated by: