synapse devops
24 TopicsHow to use CI/CD integration to automate the deploy of a Synapse Workspace to multiple environments
As an integrated analytics service that accelerates time to insight across data warehouses and big data systems, Azure Synapse Analytics delivers a unified experience through Synapse Studio, promoting collaborative work between Data Engineers, Data Scientists and Business Analysts. By enabling this collaborative environment, Azure Synapse Analytics facilitates the integration of its Big Data and Analytics capabilities with the enterprise CI/CD process. In this article we are going to demonstrate how Azure Synapse Analytics can easily integrate with one of the most adopted software development methodologies: DevOps.39KViews4likes10CommentsCICD Automation in Synapse Analytics: taking advantage of custom parameters in Workspace Templates
When using automated CI/CD in Azure Synapse Analytics, users can take advantage of custom parameters to extend the capabilities of the default Workspace template, allowing the exposure and the overriding of any artifact property that is not parameterized by default. This article will walk you through the necessary steps to create and benefit from using custom template parameters in your Synapse CICD processes.24KViews9likes4CommentsAutomatic pause all Synapse Pools and keeping your subscription costs under control
As Synapse engineer or Synapse Support Engineer you may need to start and test some Pools, and you want this to be the most cost efficient possible. Leaving some Synapse with a lot of DWU left turned on during the weekend because you forget to pause the DW after you shutdown your computers is not a good approach and we can quickly resolve this by using Powershell + Automation accounts.23KViews9likes13CommentsAutomating the Publishing of Workspace Artifacts in Synapse CICD
New features have been recently introduced in Synapse Workspace Deployment task V2 to facilitate CICD automation in Synapse. These features will give users the ability to do one touch deployments. Before introducing these features, users had to manually hit the “Publish” button from the Synapse Studio, to persist their changes in Synapse Service (Live Mode) and generate the ARM templates for deployment in the publish branch. This was a showstopper for a fully automated CICD lifecycle. With the introduction of these new features, users will no longer require the manual intervention from the UI, thus allowing a fully automated CICD in Synapse. Adding to this, these features to validate as well as generate the ARM templates for deployment using any user branch.22KViews12likes7CommentsHow to set Spark / Pyspark custom configs in Synapse Workspace spark pool
In Azure Synapse, system configurations of spark pool look like below, where the number of executors, vcores, memory is defined by default. There could be the requirement of few users who want to manipulate the number of executors or memory assigned to a spark session during execution time. Usually, we can reconfigure them by traversing to the Spark pool on Azure Portal and set the configurations in the spark pool by uploading text file which looks like this: But in the Synapse spark pool, few of these user-defined configurations get overridden by the default value of the Spark pool. What should be the next step to persist these configurations at the spark pool Session level? For notebooks If we want to set config of a session with more than the executors defined at the system level (in this case there are 2 executors as we saw above), we need to write below sample code to populate the session with 4 executors. This sample code helps to logically get more executors for a session. Execute the below code to confirm that the number of executors is the same as defined in the session which is 4 : In the sparkUI you can also see these executors if you want to cross verify : A list of many session configs is briefed here . We can also setup the desired session-level configuration in Apache Spark Job definition : For Apache Spark Job: If we want to add those configurations to our job, we have to set them when we initialize the Spark session or Spark context, for example for a PySpark job: Spark Session: from pyspark.sql import SparkSession if __name__ == "__main__": # create Spark session with necessary configuration spark = SparkSession \ .builder \ .appName("testApp") \ .config("spark.executor.instances","4") \ .config("spark.executor.cores","4") \ .getOrCreate() Spark Context: from pyspark import SparkContext, SparkConf if __name__ == "__main__": # create Spark context with necessary configuration conf = SparkConf().setAppName("testApp").set("spark.hadoop.validateOutputSpecs", "false").set("spark.executor.cores","4").set("spark.executor.instances","4") spark = SparkContext(conf=conf) Hope this helps you to configure a job/notebook as per your convenience with the number of executors.21KViews2likes0CommentsHow-To Deploy your Synapse Workspace Artifacts to a Managed VNET Synapse Workspace
This article will demonstrate how you can use the Synapse Workspace Deployment task in Azure DevOps to deploy your Synapse Workspace artifacts to a target Managed VNET Synapse Workspace that is configured to not allow public network access.20KViews4likes3CommentsDeploying Synapse SQL Serverless objects across environments using SSDT
The long-awaited feature for all Synapse CICD fans is here! SqlPackage now supports serverless SQL pools in Extract and Publish operations. In this article, I will demonstrate how you can run this utility in a DevOps pipeline to replicate your SQL serverless objects across different environments.20KViews6likes13CommentsBoost your CICD automation for Synapse SQL Serverless by taking advantage of SSDT and SqlPackage CLI
In this article I will demonstrate how you can take advantage of thee tools when implementing the CICD for the Azure Synapse Serverless SQL engine. We will leverage SQL projects in SSDT to define our objects and implement deploy-time variables (SQLCMD variables). Through CICD pipelines, we will build the SQL project to a dacpac artifact, which enables us to deploy the database objects one or many times with automation.18KViews7likes11CommentsData mesh: A perspective on using Azure Synapse Analytics to build data products
This is a multi-part blog series, and it discusses various aspects of implementing data mesh architecture on Azure. This part focuses on data as a product principle and presents a perspective on using Azure Synapse Analytics as a data product. We discuss (at a high-level) data product functions & capabilities and apply that lens to Synapse Analytics. We discuss how workspaces can be partitioned to give domains scale and agility to build data products.18KViews11likes4CommentsCICD in Synapse SQL: How to deliver your database objects across multiple environments
Azure Synapse Analytics is a product that highly promotes a collaborative environment and with collaboration comes a lot of challenges throughout your project lifecycle. You can benefit from using CICD capabilities to automate the delivery of all the work that is done in Synapse, not only the work that is done using Synapse Studio (Pipelines, Notebooks, SQL Scripts, etc) but also the development that is done at SQL pool level. The goal of this article is to show you how you can easily set up a CICD pipeline to deliver all changes that have been made to your Synapse SQL pool objects across multiple environments.14KViews2likes6Comments