Synapse Studio
58 TopicsEnhancing Team Collaboration in Azure Synapse Analytics using a Git Branching Strategy – Part 1 of 3
Introduction Over the past few years of working with numerous Synapse Studio users, many have asked how to make the most of collaborative work in Synapse Studio —especially in complex development scenarios where developers work on different projects in parallel within a single Synapse workspace. Based on our experience and internal feedback from other Synapse experts, our general recommendation is that each development team or project should have its own Synapse workspace. This approach is particularly effective when the maturity level of the teams—both in Synapse and Git, is still developing. In such cases, having separate workspaces simplifies the CI/CD journey. However, in scenarios where teams demonstrate greater maturity (especially in Git) and the number or complexity of Synapse projects is relatively low, it is possible for multiple teams and projects to coexist within a single Synapse development workspace. In these cases, evaluating your team’s maturity in both Synapse and Git is crucial. Teams must honestly assess their comfort level with these technologies. For example, expecting success from teams that are just beginning their Synapse journey and have limited Git experience—or planning to develop more than five projects in parallel within a single workspace—would likely lead to challenges. Managing even a single project in Synapse can be complex; doing so for multiple projects without sufficient expertise in both Synapse and Git could be a recipe for disaster. That said, the main objective of this article is to demonstrate how a simple Git branching strategy can enhance collaborative work in Synapse Studio, enabling different projects to be developed in parallel within a single Synapse workspace. This guide can help teams at the beginning of their Synapse journey assess their current maturity level (in both Synapse and Git) and understand what level they should aim for to adopt this approach confidently. For teams with a reasonable level of maturity, this article can help validate whether this strategy can further improve their collaborative efforts in Synapse. This is the first of three articles, where we’ll show how to implement a simple branching strategy that allows two development teams working on separate projects to share a single Synapse workspace. The strategy supports isolated code promotion through various environments without interfering with each team’s work. While we use Azure DevOps as our Git provider throghout these articles, the approach is also applicable to GitHub GitHub. Start elevating your collaborative work in Synapse Studio, by implementing a simple and effictive Git branching strategy Let’s begin by outlining our scenario: two development teams—Data Engineering and Data Science—are about to start their projects in Synapse. Both teams have substantial experience with Synapse and Git. Together, they’ve agreed on a simple Git branching strategy that will enable them to collaborate effectively in Synapse Studio while supporting a CI/CD flow designed to automate the promotion of their code from the development environment to higher environments. The Git branching strategy involves creating feature branches and environment branches, organized by team, as illustrated in the following diagram. Figure 1: A Simple Git Branching Strategy Important note on governance of the branching strategy: The first branches that should be created are the environment branches. Once these are in place, any time a developer needs to create a feature branch, it must always be based on the production environment branch of their respective team. In this strategy, the production branch serves as the team’s collaboration branch, ensuring consistency and alignment across development efforts. Figure 2: Creating a Feature Branch Based on the Production Environment Branch In the initial phase of implementing this strategy, environment branches can be created using the "Branches" feature in Azure DevOps, or locally in a developer’s repository and then pushed to the remote repository. Alternatively, teams can use the branch selector functionality within Synapse Studio. The team should choose the method they are most comfortable with. Below is an example of the branch structure that will be developed throughout this article: Figure 3: Example of Branching Structure Visualization from DevOps Start at the feature branch level... With the branching strategy defined, we can now demonstrate how the two teams will carry out their respective developments within a single Synapse development workspace. Let’s begin with Mary from the Data Engineering team, who will develop a new pipeline. She creates this pipeline in her feature branch: features/data_eng/mary/mktetl. Figure 4: Creating a Pipeline in a Feature Branch of the Data Engineering Team Meanwhile, Anna, a developer from the Data Science team, also begins working on a new feature for the Data Science project. Figure 5: Creating a Notebook in a Feature Branch of the Data Science Team Both teams are ready to start their unit testing independently, at different times, and with distinct code executions. This is where the Environment Branches come into play. …and end at the Environment branch level! After completing the development of her feature, Anna promotes her changes to the development environment. It’s important to note that the code has only been committed to Git—it has not been published to Live Mode yet. You might wonder why Anna didn’t simply use the Publish button in Synapse Studio to push her changes live. That would be a valid question—if both teams were sharing a single collaboration branch (as described here). In such a setup, the collaboration branch would contain code from both the Data Engineering and Data Science teams. However, that’s not the goal of our branching strategy. Our strategy is designed to ensure segregation at both the source control and CI/CD levels for all teams working within a shared Synapse development workspace. Instead of using a single collaboration branch for everyone, each team uses its own production environment branch as its collaboration branch. In this context, using the Publish button in Synapse Studio is not appropriate. Instead, we leverage a feature of the Synapse public extension—specifically, the the Synapse Workspace Deployment Task in Azure DevOps (or the GitHub Action for Synapse Workspace Artifacts Deployment, if using GitHub). This extension allows us to publish Synapse artifacts to any environment from any user branch—in this case, from the environment branches. Therefore, when configuring Git for your Synapse development workspace under this strategy, you can set the collaboration branch to any placeholder (e.g., main, master, or develop), as it will be ignored. This approach ensures that each team maintains code isolation throughout the development and deployment lifecycle. It’s important to understand that the decision not to use the Publish functionality in Synapse Studio is intentional and directly tied to our strategy of supporting multiple teams and multiple projects within a single Synapse workspace. Figure 6: Data Science Team: Creating a Pull Request from the Feature Branch to an Environment Branch in Synapse Studio Figure 7: Data Science Team: Configuring the Pull Request in DevOps, Indicating the Source (Feature Branch) and Destination (DEV Environment Branch) Meanwhile, Mary, our Data Engineer, has also completed the development of her feature and is now ready to publish her pipeline to the development environment. Figure 8: Data Engineering Team: Creating a Pull Request from the Feature Branch to an Environment Branch in Synapse Studio Figure 9: Data Engineering Team: Configuring the Pull Request in DevOps, Indicating the Source (Feature Branch) and Destination (DEV Environment Branch) Conclusion In conclusion, this article has demonstrated how different development teams can effectively leverage a Git branching strategy to develop their code within a single Synapse development workspace. By creating both feature branches and environment branches, the teams are able to work in parallel without interfering with each other’s development processes. This approach ensures proper isolation and enables smooth code promotion across environments. As we move forward, the next article in this series will explore how this strategy helps both teams accelerate their development lifecycle and streamline the CI/CD flow in Synapse.55Views0likes0CommentsAutomating the Publishing of Workspace Artifacts in Synapse CICD
New features have been recently introduced in Synapse Workspace Deployment task V2 to facilitate CICD automation in Synapse. These features will give users the ability to do one touch deployments. Before introducing these features, users had to manually hit the “Publish” button from the Synapse Studio, to persist their changes in Synapse Service (Live Mode) and generate the ARM templates for deployment in the publish branch. This was a showstopper for a fully automated CICD lifecycle. With the introduction of these new features, users will no longer require the manual intervention from the UI, thus allowing a fully automated CICD in Synapse. Adding to this, these features to validate as well as generate the ARM templates for deployment using any user branch.21KViews12likes7CommentsCreate a data solution on Azure Synapse Analytics with Snapshot Serengeti - Part 2 (Analytics)
This is the second blog in a four-part series on building an end-to-end data analytics and machine learning solution on Azure Synapse Analytics. If you haven't already, be sure to check out the first blog at https://aka.ms/synapseserengeti before proceeding.7.3KViews1like1CommentWriting data using Azure Synapse Dedicated SQL Pool Connector for Apache Spark
When using The Azure Synapse Dedicated SQL Pool Connector for Apache Spark, users can take advantage of read and write a large volume of data efficiently between Apache Spark to Dedicated SQL Pool in Synapse Analytics. The connector supports Scala and Python language on Synapse Notebooks to perform these operations.21KViews4likes11CommentsHow to use CI/CD integration to automate the deploy of a Synapse Workspace to multiple environments
As an integrated analytics service that accelerates time to insight across data warehouses and big data systems, Azure Synapse Analytics delivers a unified experience through Synapse Studio, promoting collaborative work between Data Engineers, Data Scientists and Business Analysts. By enabling this collaborative environment, Azure Synapse Analytics facilitates the integration of its Big Data and Analytics capabilities with the enterprise CI/CD process. In this article we are going to demonstrate how Azure Synapse Analytics can easily integrate with one of the most adopted software development methodologies: DevOps.38KViews4likes10CommentsCICD Automation in Synapse Analytics: taking advantage of custom parameters in Workspace Templates
When using automated CI/CD in Azure Synapse Analytics, users can take advantage of custom parameters to extend the capabilities of the default Workspace template, allowing the exposure and the overriding of any artifact property that is not parameterized by default. This article will walk you through the necessary steps to create and benefit from using custom template parameters in your Synapse CICD processes.24KViews9likes4CommentsSynapse Database Templates for airlines & travel services plus seven industries are now GA
We’re pleased to announce the general availability of Synapse Database Templates for organizations operating in the Airline and Travel Services industries, along with enhanced versions of Synapse Database Templates for seven other previously released industries.7KViews2likes0Comments