Here's a quick review of the road so far:
CI CD in Azure Synapse Analytics Part 1
- Creating an Azure DevOps project
- Linking our Azure Synapse Analytics environment to that Project via Git
- Validating that our Azure DevOps Repo was populated with our Azure Synapse Analytics environment
CI CD in Azure Synapse Analytics Part 2:
- Create a new branch on our Repo
- Edit our Azure Synapse Analytics environment
- Specifically my SQL scripts have demos all over the place and Buck Woody said I have to clean up my very messy room .... Azure Synapse Analytics environment
- Create a Pull Request in Azure Synapse Analytics to merge our new branch with the main
- Approve the Pull Request in Azure DevOps
- Validate our main branch is updated in our Azure Synapse Analytics Environment
CI CD in Azure Synapse Analytics Part 3
- Create an Artifact pipeline
- This is to create an Artifact we can use to deploy to another environment
This time we will:
- Give our Azure DevOps Service Principal access to our Azure Synapse Workspace
- Validate or Give our Azure DevOps Service Principal Storage Contributor & User Access Administrator (*This is only if your storage account was provisioned before you created your Synapse Workspace or if you connect your Dev, QA, and Prod to the same ADLS G2 storage account. If you create your Synapse Workspace and Storage account from an ARM template deployed from DevOps, then your DevOps Service Principal will have Owner on the Storage Account and that gives the Service Principal User Access Administrator capabilities.)
- Create the release pipeline
- Validate the deployment
- *If you have SQL Provisioned Pools as part of your deployment pause them, because they will be created on deployment
Before we create our release pipeline we need to make sure the Azure Service Principal account has the proper permissions. If we do not, you well get cryptic errors with a GUID and something about "does not have permissions to blah blah blah". Trust me, it is super annoying.
The two permission we will need are located in two different places, the first is in our Azure Synapse Workspace, specifically using Azure Synapse Studio. The second will be in the Storage Account for our Azure Data Lake Gen 2 that is the default ADLS connection for our Azure Synapse Studio.
First open your Azure Synapse Studio and navigate to the Management Blade.
Now Click on Access Control. The user that created the Azure Synapse Workspace is automatically given the role of Synapse Administrator, the second user with that role will be the Managed Identity for the Azure Synapse Workspace. We need to add our Azure DevOps Service Principal to this role.
*A quick note, I prefer to manage this in an Azure Active Directory Group. In this blog I will show how to add the account directly to the Azure Synapse Workspace. However, the best practice would be to have an AAD group that is granted Synapse Administrator and then add the role to that group.
Click the +Add link.
Now we will type in the Azure DevOps Project name. If your project URL was https://bobsburgers.visualstudio.net and your project name was Azure Synapse Studio CI CD you would type in bobsburgers-Azure Synapse Studio CI CD. This would show you the Service Principal GUID following that name.
Click the name as it appears and then click the Apply Button.
Now open a browser and navigate to the Azure Portal. In the search window at the type Storage Accounts. Select the storage account that you are using as your default ADLS Storage Account for your Azure Synapse Workspace. Click the Access Control (IAM) blade. Click +Add, then click Add role assignment.
Select Storage Blob Data Contributor for Role. In the Select text box, type in the Azure DevOps Service Principal the same way we did in the for the Synapse Administrator role.
Repeat the previous steps, except this time specify the User Access Administrator Role.
Next we will navigate to our Azure DevOps Project. Select pipelines, Releases, and New Pipeline.
Click on Empty Job. Then click on Add an artifact.
Ensure our project is selected. Select the name of the Build Pipeline that we created in our previous blog (Or whatever YOU wanted to name your Build Pipeline because my naming conventions do not define you!). Select the Latest Build, and click Add.
Rename the Release Pipeline to reflect what we are doing. We selected Deploy Dev Release. Clock on the Stage1 link 1 job, 0 task
Click the + plus sign on Agent Job. In the search text box type "Synapse", the Synapse workspace deployment task will appear if you have installed it from the Marketplace. If not, FEAR NOT! You should see a link for it below under the heading Marketplace. Click on it to install the task to your Azure DevOps project.
Once you have added the task, click on the task. We will fill out the Template, Template parameters, Synapse Workspace connection type, Synapse Workspace name, and we will get to OverrideArmParameters in a moment. That will a lot more details.
First click on the ... ellipses by Template.
Navigate through the build pipeline, ASW_Drop, ARM, to the TemplateForWorkspace.json. Select the .json file and click OK.
Now repeat the same steps for the Template Parameters text box, this time selecting the TemplateParametersForWorkspace.json file.
Under Synapse workspace connection type, select the Azure Subscription that contains the environment where we are deploying our release. Specify the Resource Group and the Azure Synapse Workspace name.
Now we begin to focus on the override parameters. First we will travel back to our Repo and look at the TemplateForWorkspace.json. Any string that has a type "secureString" will need to have an override parameter. Depending on the level of development you have done, you may have many of these strings, in our example we have two. The default workspace connection to the Provisioned SQL Pools and a Linked Service I created to an Azure SQL Database.
Dear Reader, you are wondering where to find those. You are in luck! Navigate to your Azure Synapse Analytics Workspace, click on the Manage blade, then Linked services. Now click on the { } Code symbol after the name of the linked service that is a type securedString.
This will open a view of the JSON in that contains the data we need. Copy the text between the double quotes. DO NOT SELECT THE DOUBLE QUOTES!! JUST THE STRING BETWEEN THE DOUBLE QOUTES!
Sorry for yelling, but we will use this string soon and the double quotes "" will cause it to fail.
Now let's do the same thing for the Azure SQL Database.
Now navigate back to our Azure DevOps Release pipeline. Click on Variables, then click the + Add button 3 times. We will be creating two variables based on the secureStrings in our JSON file. We will also be creating a system.debug value to give us extra information in our release pipeline, it's value is True.
After you copy in the secureStrings, click the lock button by the two connection strings, leave system.debug unencrypted.
Your pipeline should look similar to this.
Now we will go to the OverrideArmParameters text area. We will use the following syntax -variableNameFromTheJsonFile $(devOpsPipelineVarriable)
For example:
-bballasw-WorkspaceDefaultSqlServer_connectionString $(WorkspaceDefault) -Lahman_connectionString $(Lahman)
Yours may vary based your number of secureStrings and names. Now let's click Save on our pipeline.
Make a comment and click OK
Now click Create release.
Click Create
Click Release-1 (or whatever your release number is).
After your Agent begins to process click on Logs and watch it run!
AND NOW!!!!! ......it failed.
A few times. But hey, it's not developing if there isn't a failure. So it's almost 1 am, and I *believe* I have it running so let me take this time to walk you through what I've found.
Spark Pools and Self Hosted Integration Runtimes are not created in a pipeline. If you have a Linked Service that uses a Self Hosted Integration Runtime you will need to manually create that in your QA or Prod environment prior to deployment.
If you are developing Notebooks and have them connected to a Spark Pool, you will need to recreate that Spark Pool in QA or Production. Notebooks that are linked to a Spark Pool that does not exist in an environment will fail to deploy.
Name them the same thing. Do not change names. Trust me.
If you are doing a deployment and your Provisioned SQL Pools are Paused then the deployment will fail. *More to come on database migrations, a database project build, and release is still needed.
Here's a quick image.
I'm on release 4, attempt 2. This appears to be running just fine for me.
VICTORY!!
Now let us go and check our QA Workspace! First up Scripts and Notebooks.
Excellent! Everything is there. Next let us look at our Provisioned SQL Pools.
Looks great! As a quick side note the databases will be brought over at DW100c, so you can auto scale them as needed. Also if you are in a demo environment like me, be sure to pause them after the deployment completes. Next up Pipelines.
I like this! Now let's check out linked services.
I don't like this. Here are my Dev links for my default workspace in my QA environment. Right now the only way I've found to clean this up is to use the Az.Synapse PowerShell Module. Navigate back to your release pipeline. Edit it, add an Azure PowerShell task. We will then use this script:
##Required for azure devops initial deployment
Install-Module Az.Synapse -RequiredVersion 0.2.0 -Scope CurrentUser -Force
#get rid of dev linked service in QA
Remove-AzSynapseLinkedService -WorkspaceName yourworkspaceName -Name linkedservicetoRemove
Remove-AzSynapseLinkedService -WorkspaceName yourworkspaceName -Name linkedservicetoRemove
Under Azure PowerShell version options select Specify other version, set Preferred Azure PowerShell Version 3.1.0.
The next time you run your deployment, this should clean up those links.
All right Dear Reader, I'm off to sleep. Happy Monday and as always, thank you for stopping by.
Thanks,
Brad