Data Pipelines

7 Topics

CI CD in Azure Synapse Analytics Part 3
In this edition of our series we will create an Artifact / Build pipeline, create a Release pipeline, deploy our Azure Synapse Analytics environment from Dev to QA. Review what was created and what was NOT created. Also pause your SQL Pools if you are not using them.
Bradley_Ball
Dec 30, 2024 Place Data Architecture Blog
26KViews
9likes
18Comments
Cross Subscription Database Restore for SQL Managed Instance Database with TDE enabled using ADF
Our customers require daily refreshes of their production database to the non-production environment. The database, approximately 600GB in size, has Transparent Data Encryption (TDE) enabled in production. Disabling TDE before performing a copy-only backup is not an option, as it would take hours to disable and re-enable. To meet customer needs, we use a customer-managed key stored in Key Vault. Azure Data Factory is then utilized to schedule and execute the end-to-end database restore process.
MUA
Nov 13, 2024 Place Data Architecture Blog
1.4KViews
1like
0Comments
Create and Deploy Azure SQL Managed Instance Database Project integrated with Azure DevOps CICD
Integrating database development into continuous integration and continuous deployment (CI/CD) workflows is the best practice for Azure SQL managed instance database projects. Automating the process through a deployment pipeline is always recommended. This automation ensures that ongoing deployments seamlessly align with your continuous local development efforts, eliminating the need for additional manual intervention. This article guides you through the step-by-step process of creating a new azure SQL managed instance database project, adding objects to it, and setting up a CICD deployment pipeline using GitHub actions. Prerequisites Visual Studio 2022 Community, Professional, or Enterprise Azure DevOps environment Contributor permission within Azure DevOps Con Sysadmin server roles within Azure SQL managed instance Step 01 Open Visual Studio, click Create a new project Search for SQL Server, select SQL Server Database Project Provide the project name, folder path to store .dacpac file, create Step 2 Import the database schema from an existing database. Right-click on the project and select 'Import'. You will see three options: Data-Tier Application (.dacpac), Database, and Script (.sql). In this case, I am using the Database option and importing form Azure SQL managed instance To proceed, you will encounter a screen that allows you to provide a connection string. You can choose to select a database from local, network, or Azure sources, depending on your needs. Alternatively, you can directly enter the server name, authentication type, and credentials to connect to the database server. Once connected, select the desired database to import and include in your project. Step 3 Configure the import settings. There are several options available, each designed to optimize the process and ensure seamless integration. Import application-scoped objects: will import tables, views, stored procedures likewise objects. Imports reference logins: login related imports. Import Permissions: will import related permissions. Import database settings: will import database settings. Folder Structure: option to choose folder structure in your project for database objects. Maximum files per folder: limit number files per folder. Click Start which will show the progress window as shown. Click “Finish” to complete the step. Step 4 To ensure a smooth deployment process, start by incorporating any necessary post-deployment scripts into your project. These scripts are crucial for executing tasks that must be completed after the database has been deployed, such as performing data migrations or applying additional configurations. To compile your database project in Visual Studio, simply right-click on the project and select 'Build'. This action will compile the project and generate a sqlproj file, ensuring that your database project is ready for deployment. When building the project, you might face warnings and errors that need careful debugging and resolution to ensure the successful creation of the sqlproj file. Common issues include missing references, syntax errors, or configuration mismatches. After addressing all warnings and errors, rebuild the project to create the sqlproj file. This file contains the database schema and is essential for deployment. Ensure that any post-deployment scripts are seamlessly integrated into the project. These scripts will run after the database deployment, performing any additional necessary tasks. To ensure all changes are tracked and can be deployed through your CI/CD pipeline, commit the entire codebase, including the sqlproj file and any post-deployment scripts, to your branch in Azure DevOps. This step guarantees that every modification is documented and ready for deployment. Step 5 Create Azure DevOps pipeline to deploy database project Step 6 To ensure the YAML file effectively builds the SQL project and publishes the DACPAC file to the artifact folder of the pipeline, include the following stages. stages: - stage: Build jobs: - job: BuildJob displayName: 'Build Stage' steps: - task: VSBuild@1 displayName: 'Build SQL Server Database Project' inputs: solution: $(solution) platform: $(buildPlatform) configuration: $(buildConfiguration) - task: CopyFiles@2 inputs: SourceFolder: '$(Build.SourcesDirectory)' Contents: '**\*.dacpac' TargetFolder: '$(Build.ArtifactStagingDirectory)' flattenFolders: true - task: PublishPipelineArtifact@1 inputs: targetPath: '$(Build.ArtifactStagingDirectory)' artifact: 'dacpac' publishLocation: 'pipeline' - stage: Deploy jobs: - job: Deploy displayName: 'Deploy Stage' pool: name: 'Pool' steps: - task: DownloadPipelineArtifact@2 inputs: buildType: current artifact: 'dacpac' path: '$(Build.ArtifactStagingDirectory)' - task: PowerShell@2 displayName: 'upgrade sqlpackage' inputs: targetType: 'inline' script: | # use evergreen or specific dacfx msi link below wget -O DacFramework.msi "https://aka.ms/dacfx-msi" msiexec.exe /i "DacFramework.msi" /qn - task: SqlAzureDacpacDeployment@1 inputs: azureSubscription: '$(ServiceConnection)' AuthenticationType: 'servicePrincipal' ServerName: '$(ServerName)' DatabaseName: '$(DatabaseName)' deployType: 'DacpacTask' DeploymentAction: 'Publish' DacpacFile: '$(Build.ArtifactStagingDirectory)/*.dacpac' IpDetectionMethod: 'AutoDetect' Step 7 To execute any Pre and Post SQL script during deployment, you need to update the SQL package, obtain an access token, and then run the scripts. # install all necessary dependencies onto the build agent - task: PowerShell@2 name: install_dependencies inputs: targetType: inline script: | # Download and Install Azure CLI write-host "Installing AZ CLI..." Invoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile .\AzureCLI.msi Start-Process msiexec.exe -Wait -ArgumentList "/I AzureCLI.msi /quiet" Remove-Item .\AzureCLI.msi write-host "Done." # prepend the az cli path for future tasks in the pipeline write-host "Adding AZ CLI to PATH..." write-host "##vso[task.prependpath]C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin" $currentPath = (Get-Item -path "HKCU:\Environment" ).GetValue('Path', '', 'DoNotExpandEnvironmentNames') if (-not $currentPath.Contains("C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin")) { setx PATH ($currentPath + ";C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin") } if (-not $env:path.Contains("C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin")) { $env:path += ";C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin" } write-host "Done." # install necessary PowerShell modules write-host "Installing necessary PowerShell modules..." Get-PackageProvider -Name nuget -force if ( -not (Get-Module -ListAvailable -Name Az.Resources) ) { install-module Az.Resources -force } if ( -not (Get-Module -ListAvailable -Name Az.Accounts) ) { install-module Az.Accounts -force } if ( -not (Get-Module -ListAvailable -Name SqlServer) ) { install-module SqlServer -force } write-host "Done." - task: AzureCLI@2 name: run_sql_scripts inputs: azureSubscription: '$(ServiceConnection)' scriptType: ps scriptLocation: inlineScript inlineScript: | # get access token for SQL $token = az account get-access-token --resource https://database.windows.net --query accessToken --output tsv # configure OELCore database Invoke-Sqlcmd -AccessToken $token -ServerInstance '$(ServerName)' -Database '$(DatabaseName)' -inputfile '.\pipelines\config-db.dev.sql'
MUA
Nov 13, 2024 Place Data Architecture Blog
965Views
2likes
1Comment
Automated Continuous integration and delivery – CICD in Azure Data Factory
In Azure Data Factory, continuous integration and delivery (CI/CD) involves transferring Data Factory pipelines across different environments such as development, test, UAT and production. This process leverages Azure Resource Manager templates to store the configurations of various ADF entities, including pipelines, datasets, and data flows. This article provides a detailed, step-by-step guide on how to automate deployments using the integration between Data Factory and Azure Pipelines. Prerequisite Azure database factory, Setup of multiple ADF environments for different stages of development and deployment. Azure DevOps, the platform for managing code repositories, pipelines, and releases. Git Integration, ADF connected to a Git repository (Azure Repos or GitHub). The ADF contributor and Azure DevOps build administrator permission is required Step 1 Establish a dedicated Azure DevOps Git repository specifically for Azure Data Factory within the designated Azure DevOps project. Step 2 Integrate Azure Data Factory (ADF) with the Azure DevOps Git repositories that were created in the first step. Step 3 Create developer feature branch with the Azure DevOps Git repositories that were created in the first step. Select the created developer feature branch from ADF to start the development. Step 4 Begin the development process. For this example, I create a test pipeline “pl_adf_cicd_deployment_testing” and save all. Step 5 Submit pull request from developer feature branch to main Step 6 Once the pull requests are merged from the developer's feature branch into the main branch, proceed to publish the changes from the main branch to the ADF Publish branch. The ARM templates (JSON files) will get up-to date, they will be available in the adf-publish branch within the Azure DevOps ADF repository. Step 7 ARM templates can be customized to accommodate various configurations for Development, Testing, and Production environments. This customization is typically achieved through the ARMTemplateParametersForFactory.json file, where you specify environment-specific values such as link service, environment variables, managed link and etc. For example, in a Testing environment, the storage account might be named teststorageaccount, whereas in a Production environment, it could be prodstorageaccount. To create environment specific parameters file Azure DevOps ADF Git repo > main branch > linkedTemplates folder > Copy “ARMTemplateParametersForFactory.json” Create parameters_files folder under root path Copy paste ARMTemplateParametersForFactory.json inside parameters_files folder and rename to specify environment for example, prod-adf-parameters.json Update each environment specific parameter values Step 8 To create an Azure DevOps CICD pipeline, use the following code and ensure you update the variables to match your environment before running it. This will allow you to deploy from one ADF environment to another, such as from Test to Production. name: Release-$(rev:r) trigger: branches: include: - adf_publish variables: azureSubscription: <Your subscription> SourceDataFactoryName: <Test ADF> DeployDataFactoryName: <PROD ADF> DeploymentResourceGroupName: <PROD ADF RG> stages: - stage: Release displayName: Release Stage jobs: - job: Release displayName: Release Job pool: vmImage: 'windows-2019' steps: - checkout: self # Stop ADF Triggers - task: AzurePowerShell@5 displayName: Stop Triggers inputs: azureSubscription: '$(azureSubscription)' ScriptType: 'InlineScript' Inline: | $triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName "$(DeployDataFactoryName)" -ResourceGroupName "$(DeploymentResourceGroupName)" if ($triggersADF.Count -gt 0) { $triggersADF | ForEach-Object { Stop-AzDataFactoryV2Trigger -ResourceGroupName "$(DeploymentResourceGroupName)" -DataFactoryName "$(DeployDataFactoryName)" -Name $_.name -Force } } azurePowerShellVersion: 'LatestVersion' # Deploy ADF using ARM Template and UAT JSON parameters - task: AzurePowerShell@5 displayName: Deploy ADF inputs: azureSubscription: '$(azureSubscription)' ScriptType: 'InlineScript' Inline: | New-AzResourceGroupDeployment ` -ResourceGroupName "$(DeploymentResourceGroupName)" -TemplateFile "$(System.DefaultWorkingDirectory)/$(SourceDataFactoryName)/ARMTemplateForFactory.json" -TemplateParameterFile "$(System.DefaultWorkingDirectory)/parameters_files/prod-adf-parameters.json" -Mode "Incremental" azurePowerShellVersion: 'LatestVersion' # Restart ADF Triggers - task: AzurePowerShell@5 displayName: Restart Triggers inputs: azureSubscription: '$(azureSubscription)' ScriptType: 'InlineScript' Inline: | $triggersADF = Get-AzDataFactoryV2Trigger -DataFactoryName "$(DeployDataFactoryName)" -ResourceGroupName "$(DeploymentResourceGroupName)" if ($triggersADF.Count -gt 0) { $triggersADF | ForEach-Object { Start-AzDataFactoryV2Trigger -ResourceGroupName "$(DeploymentResourceGroupName)" -DataFactoryName "$(DeployDataFactoryName)" -Name $_.name -Force } } azurePowerShellVersion: 'LatestVersion' Triggering the Pipeline The Azure DevOps CI/CD pipeline is designed to automatically trigger whenever changes are merged into the main branch. Additionally, it can be initiated manually or set to run on a schedule for periodic deployments, providing flexibility and ensuring that updates are deployed efficiently and consistently. Monitoring and Rollback To monitor the pipeline execution, utilize the Azure DevOps pipeline dashboards. In case a rollback is necessary, you can revert to previous versions of the ARM templates or pipelines using Azure DevOps and redeploy the changes.
MUA
Nov 13, 2024 Place Data Architecture Blog
1.7KViews
3likes
1Comment
Azure SQL Managed Instance Cross Subscription Database Restore using Azure Data Factory
Azure Data Factory (ADF) set up automated, continuous, or on-demand restoration of Azure SQL Managed Instance (MI) databases between two separate subscriptions.
MUA
Nov 07, 2024 Place Data Architecture Blog
1.2KViews
1like
0Comments
Data Architecture and Designing for Change in the Age of Digital Transformation
Change is constant whether you are designing a new product using the latest design thinking and human-centered product development, or carefully maintaining and managing changes to existing systems, applications, and services. In this post I would like to provide both food for thought related to data architecture and change, as well as provide exposure to a practical analytics accelerator to capture change in data pipelines. Along the way I also want to discuss a couple of terms often referenced in data management and analytics discussions: 1) One Version of the Truth, and 2) Data Swamp. I have never liked either of these terms and will try to explain why realistically these are loaded, misleading, and rather biased terms. Here is the Analytics Accelerator on Change Data Management https://github.com/DataSnowman/ChangeDataCapture
DarwinSchweitzer
Jun 19, 2023 Place Data Architecture Blog
18KViews
1like
5Comments
CI CD in Azure Synapse Analytics Part 1
In this blog we create an Azure DevOps Project, Configure Git, and Create a Branch for future changes
Bradley_Ball
Mar 15, 2022 Place Data Architecture Blog
27KViews
9likes
4Comments