CI CD in Azure Synapse Analytics Part 4 - The Release Pipeline

Published 01-31-2021 09:58 PM 4,406 Views

Here's a quick review of the road so far:

 

CI CD in Azure Synapse Analytics Part 1

  • Creating an Azure DevOps project
  • Linking our Azure Synapse Analytics environment to that Project via Git
  • Validating that our Azure DevOps Repo was populated with our Azure Synapse Analytics environment

CI CD in Azure Synapse Analytics Part 2:

  • Create a new branch on our Repo
  • Edit our Azure Synapse Analytics environment
    • Specifically my SQL scripts have demos all over the place and Buck Woody said I have to clean up my very messy room .... Azure Synapse Analytics environment
  • Create a Pull Request in Azure Synapse Analytics to merge our new branch with the main
  • Approve the Pull Request in Azure DevOps
  • Validate our main branch is updated in our Azure Synapse Analytics Environment

CI CD in Azure Synapse Analytics Part 3

  • Create an Artifact pipeline
    • This is to create an Artifact we can use to deploy to another environment

This time we will:

  • Give our Azure DevOps Service Principal access to our Azure Synapse Workspace
  • Validate or Give our Azure DevOps Service Principal Storage Contributor & User Access Administrator (*This is only if your storage account was provisioned before you created your Synapse Workspace or if you connect your Dev, QA, and Prod to the same ADLS G2 storage account.  If you create your Synapse Workspace and Storage account from an ARM template deployed from DevOps, then your DevOps Service Principal will have Owner on the Storage Account and that gives the Service Principal User Access Administrator capabilities.)
  • Create the release pipeline
  • Validate the deployment
  • *If you have SQL Provisioned Pools as part of your deployment pause them, because they will be created on deployment

Before we create our release pipeline we need to make sure the Azure Service Principal account has the proper permissions.  If we do not, you well get cryptic errors with a GUID and something about "does not have permissions to blah blah blah".  Trust me, it is super annoying.

 

The two permission we will need are located in two different places, the first is in our Azure Synapse Workspace, specifically using Azure Synapse Studio.  The second will be in the Storage Account for our Azure Data Lake Gen 2 that is the default ADLS connection for our Azure Synapse Studio.

 

First open your Azure Synapse Studio and navigate to the Management Blade.

 

CI CD Release Pipeline 01.png

 

Now Click on Access Control.  The user that created the Azure Synapse Workspace is automatically given the role of Synapse Administrator, the second user with that role will be the Managed Identity for the Azure Synapse Workspace.  We need to add our Azure DevOps Service Principal to this role.

 

*A quick note, I prefer to manage this in an Azure Active Directory Group. In this blog I will show how to add the account directly to the Azure Synapse Workspace.  However, the best practice would be to have an AAD group that is granted Synapse Administrator and then add the role to that group.

 

Click the +Add link.

 

CI CD Release Pipeline 02.png

 

Now we will type in the Azure DevOps Project name.  If your project URL was https://bobsburgers.visualstudio.net and your project name was Azure Synapse Studio CI CD you would type in bobsburgers-Azure Synapse Studio CI CD.  This would show you the Service Principal GUID following that name. 

 

Click the name as it appears and then click the Apply Button.

 

CI CD Release Pipeline 03.png

Now open a browser and navigate to the Azure Portal.  In the search window at the type Storage Accounts.  Select the storage account that you are using as your default ADLS Storage Account for your Azure Synapse Workspace.  Click the Access Control (IAM) blade.  Click +Add, then click Add role assignment.

 

CI CD Release Pipeline 05.png

 

Select Storage Blob Data Contributor for Role.  In the Select text box, type in the Azure DevOps Service Principal the same way we did in the for the Synapse Administrator role.

 

CI CD Release Pipeline 06.png

 

Repeat the previous steps, except this time specify the User Access Administrator Role.

 

CI CD Release Pipeline 07.png

 

Next we will navigate to our Azure DevOps Project.   Select pipelines, Releases, and New Pipeline.

 

CI CD Release Pipeline 08.png

 

Click on Empty Job.  Then click on Add an artifact.

 

CI CD Release Pipeline 09.png

 

Ensure our project is selected.  Select the name of the Build Pipeline that we created in our previous blog (Or whatever YOU wanted to name your Build Pipeline because my naming conventions do not define you!).  Select the Latest Build, and click Add. 

CI CD Release Pipeline 10.png

 

Rename the Release Pipeline to reflect what we are doing.  We selected Deploy Dev Release.  Clock on the Stage1 link 1 job, 0 task

 

CI CD Release Pipeline 11.png

Click the + plus sign on Agent Job.  In the search text box type "Synapse", the Synapse workspace deployment task will appear if you have installed it from the Marketplace.  If not, FEAR NOT!  You should see a link for it below under the heading Marketplace.  Click on it to install the task to your Azure DevOps project.

 

CI CD Release Pipeline 12.png

 

Once you have added the task, click on the task.  We will fill out the Template, Template parameters, Synapse Workspace connection type, Synapse Workspace name, and we will get to OverrideArmParameters in a moment.  That will a lot more details.

 

First click on the ... ellipses by Template. 

CI CD Release Pipeline 13.png

 Navigate through the build pipeline, ASW_Drop, ARM, to the TemplateForWorkspace.json.  Select the .json file and click OK.

 

CI CD Release Pipeline 14.png

 

Now repeat the same steps for the Template Parameters text box, this time selecting the TemplateParametersForWorkspace.json file.

 

CI CD Release Pipeline 15.png

 

 

Under Synapse workspace connection type, select the Azure Subscription that contains the environment where we are deploying our release.  Specify the Resource Group and the Azure Synapse Workspace name.

 

CI CD Release Pipeline 16.png

Now we begin to focus on the override parameters.  First we will travel back to our Repo and look at the TemplateForWorkspace.json.  Any string that has a type "secureString" will need to have an override parameter. Depending on the level of development you have done, you may have many of these strings, in our example we have two.  The default workspace connection to the Provisioned SQL Pools and a Linked Service I created to an Azure SQL Database.

 

CI CD Release Pipeline 17.png

 

Dear Reader, you are wondering where to find those.  You are in luck!  Navigate to your Azure Synapse Analytics Workspace, click on the Manage blade, then Linked services.  Now click on the { } Code symbol after the name of the linked service that is a type securedString.  

CI CD Release Pipeline 18.png
This will open a view of the JSON in that contains the data we need.  Copy the text between the double quotes.  DO NOT SELECT THE DOUBLE QUOTES!! JUST THE STRING BETWEEN THE DOUBLE QOUTES!  

Sorry for yelling, but we will use this string soon and the double quotes "" will cause it to fail.

CI CD Release Pipeline 19.png

 

Now let's do the same thing for the Azure SQL Database.

CI CD Release Pipeline 20.png

 

 

Now navigate back to our Azure DevOps Release pipeline.  Click on Variables, then click the + Add button 3 times.  We will be creating two variables based on the secureStrings in our JSON file.  We will also be creating a system.debug value to give us extra information in our release pipeline, it's value is True.

After you copy in the secureStrings, click the lock button by the two connection strings, leave system.debug unencrypted.

CI CD Release Pipeline 22.png

Your pipeline should look similar to this.

CI CD Release Pipeline 22b.png

 

 

Now we will go to the OverrideArmParameters text area.  We will use the following syntax                                -variableNameFromTheJsonFile $(devOpsPipelineVarriable)

For example:

            -bballasw-WorkspaceDefaultSqlServer_connectionString $(WorkspaceDefault) -Lahman_connectionString $(Lahman)

CI CD Release Pipeline 23.png

Yours may vary based your number of secureStrings and names.  Now let's click Save on our pipeline.

CI CD Release Pipeline 24.png

Make a comment and click OK

CI CD Release Pipeline 25.png

Now click Create release.

CI CD Release Pipeline 26.png

 

Click Create

 

CI CD Release Pipeline 27.png

Click Release-1 (or whatever your release number is).

CI CD Release Pipeline 28.png

 

After your Agent begins to process click on Logs and watch it run!

 

CI CD Release Pipeline 29.png

 

 

AND NOW!!!!! ......it failed.

 

A few times.  But hey, it's not developing if there isn't a failure.  So it's almost 1 am, and I *believe* I have it running so let me take this time to walk you through what I've found.

Spark Pools and Self Hosted Integration Runtimes are not created in a pipeline.  If you have a Linked Service that uses a Self Hosted Integration Runtime you will need to manually create that in your QA or Prod environment prior to deployment. 

If you are developing Notebooks and have them connected to a Spark Pool, you will need to recreate that Spark Pool in QA or Production.  Notebooks that are linked to a Spark Pool that does not exist in an environment will fail to deploy.

 

Name them the same thing.  Do not change names.  Trust me.  

If you are doing a deployment and your Provisioned SQL Pools are Paused then the deployment will fail.  *More to come on database migrations, a database project build, and release is still needed.

 

Here's a quick image. 

CI CD Release Pipeline 30.png

I'm on release 4, attempt 2.  This appears to be running just fine for me.

VICTORY!!

 

CI CD Release Pipeline 31.png

 

Now let us go and check our QA Workspace!  First up Scripts and Notebooks.

 

CI CD Release Pipeline 32.png

Excellent!  Everything is there.  Next let us look at our Provisioned SQL Pools.

CI CD Release Pipeline 33.png

Looks great!  As a quick side note the databases will be brought over at DW100c, so you can auto scale them as needed.  Also if you are in a demo environment like me, be sure to pause them after the deployment completes.   Next up Pipelines.

 

CI CD Release Pipeline 34.png

I like this!  Now let's check out linked services.

 

CI CD Release Pipeline 35.png

 

I don't like this.  Here are my Dev links for my default workspace in my QA environment.  Right now the only way I've found to clean this up is to use the Az.Synapse PowerShell Module.  Navigate back to your release pipeline.  Edit it, add an Azure PowerShell task.  We will then use this script:

 

 

##Required for azure devops initial deployment
Install-Module Az.Synapse -RequiredVersion 0.2.0 -Scope CurrentUser -Force

#get rid of dev linked service in QA
Remove-AzSynapseLinkedService -WorkspaceName yourworkspaceName -Name linkedservicetoRemove
Remove-AzSynapseLinkedService -WorkspaceName yourworkspaceName -Name linkedservicetoRemove

 

Under Azure PowerShell version options select Specify other version, set Preferred Azure PowerShell Version 3.1.0.  

 

The next time you run your deployment, this should clean up those links.

CI CD Release Pipeline 36.png

 

All right Dear Reader, I'm off to sleep.  Happy Monday and as always, thank you for stopping by.

Thanks,

 

Brad

 

 

 

 

 

13 Comments
Regular Visitor

Hi Brad,

Thanks for this new article and all the work you've done, I can't wait to give it a try and hopefully get this working finally! I can already see places I went wrong. :D

 

 

Thank you @KeyBee !  Good Luck :)!

Microsoft

Thank you Brad for this article! YOU SAVED MY DAY! I spent already hours to get the thing running...

Senior Member

Hi Brad,

 

Thanks a lot for the blog. It explains my questions on why it ignores my Spark pools when deploying to another workspace.

One question though, why do we have to override those parameters with type secureString? Before reading this I had tried a few times without overriding any parameters and didn't see any obvious error (might have missed something?)

 

And also fyi, my workspace studio publishes the resources into a separate branch named 'workspace_publish' so we have to deploy from there. I'm not sure why yours uses the same branch.

Hi @MinhDNguyen By Default, unless you change it to main, the default publish branch will be Workplace_Publish.  In part 1 I left this as the default, but you can change it, even after deployment.

In the Management Blade you would click on Git Configuration, then click at the Gear icon at the top and it will open a blade on the right hand screen.  you can change workplace publish to main.

 

Other wise every time you publish your JSON in Azure DevOps, it will publish the newest JSON to that Workplace_Publish branch.  You would need to build off of that branch instead of main.  It's get's a bit confusing to me, but my Buddy Steve St. Jean (to whom I owe teaching me how to do this prior to Winter Break in December) who is one of the best Developers I've ever met, assured me this was the way to go.  And if I got any of this wrong, it is my fault and not Steve's, I'm just doing a bad job re-explaining this :). 

Senior Member

Thank you Brad for your explanation. Additionally, I would like to ask: why do we have to override those secureString parameters?

I had tried to deploy a few times without overriding any parameters before coming across this article, and didn't see any obvious issues. I might have missed something though....

Hi @MinhDNguyen In ARM secureStrings are a data type requiring an encrypted value.  It is the most common and popular way of handling secrets in ARM template parameters. If you  have a field in an ARM template that specifies a parameter of type secureString, you must pass it an encrypted value.  It cannot be null, it cannot be in plain text. 

 

Instead of inserting the text you can use Azure Key Vault to pass strings, you can also create variable groups that could be assigned per environment so you can have different variables for QA and Prod.  I plan to get into those things eventually but not this early in the series. 

I cannot explain why you are not getting this error, this was the first error I ran into that had to be fixed.  I'm glad you are not receiving errors :).

Regular Visitor

Thank you so much for this.  I was attempting to use documentation but could get none of the steps in it to work.  This series has been a breath of clarity.

 

 

Regular Visitor

Hi @Bradley Ball (MICROSOFT) quick question about the powershell task that removes the dev linked services.  I had a few issues with it if I'm honest, I found that with

Install-Module Az.Synapse -RequiredVersion 0.2.0 -Scope CurrentUser -Force

Remove-AzSynapseLinkedService was not in the list of cmdlets imported, changing to version 0.7.0 solved this for me.  I also had to add a -Force command to the  Remove-AzSynapseLinkedService task:

$linkedService | Remove-AzSynapseLinkedService  -Force

All of these were surmountable and as you said

But hey, it's not developing if there isn't a failure. 

What I did find however was that the Linked service that had Related objects (the default storage in my case) is not removed. 

Before

BeforeBefore

After
AfterAfter

How did you solve that issue if indeed you could? I noticed in your image there is 1 related object I assume its an Integration dataset. Have you a way to change the name of the service that Integration datasets are pointing to? 

Senior Member

Hello Bradley , Thanks a lot for sharing! All steps I've done and it works perfectly.

 

 

Microsoft

Hi @Bradley Ball (MICROSOFT) ,

Thank you for publishing this series.

In the official CI/CD documentation, there are two tasks before the Synapse Workspace deployment task. One is for deploying an ARM template. Though it's not explicitly stated, I take it this is to create or update the actual Synapse Workspace itself, before deploying the contents?

Can you elaborate on why that's not in your walkthrough, please?

 
 
Frequent Visitor

Thank you for the post. In your response to @MinhDNguyen I understand your perspective but there are still issues with that approach. There are valuable insights that we lose when we don't maintain lineage between main and our releases, like user story/task/bug tracking. Right now, this is proving to be challenging, because when I link and track changes with my release pipeline, I lose all insight on the release, but I can track those changes for my database project for the Synapse dedicated SQL Pool, maintained in a separate repo with independent build and releases. My concern is that for larger teams where we have policy protected branches, we lose the ability to merge these changes back into main with the manual publish, to do this we would need an additional pull request. The Azure Data Factory team has a much better approach to this issue. Automated publishing for continuous integration and delivery - Azure Data Factory | Microsoft Docs I think the biggest advantage to an approach where we leverage a more traditional build pipeline is that it empowers the engineering team to leverage the most appropriate branching strategy for their teams and is more suitable for managing hotfixes without a ton of manual work to deploy changes.

Senior Member

Hi Brad,

 

In this example we can see that the provisioned SQL pools databases are populated in the new workspace. How about the SQL on demand databases? Are they propagated from the source workspace too? In my experience, they are not. I do hope that I am missing something on this because I need the SQL on demand databases :cry:

Version history
Last update:
‎Jan 31 2021 09:56 PM