Explore the Manage Hub in Synapse Studio to provision and secure resources

Published Dec 16 2020 09:29 AM 3,305 Views
Microsoft

In this guide, you will use the Azure Synapse Analytics Knowledge center to understand the Manage hub blade of Synapse Studio. You will learn how Synapse integrates with other data sources, even non-Azure ones, such as Amazon S3; explore data contained within those sources; and then address access control scenarios in Synapse to facilitate collaboration while keeping resources secure.

 

Creating a Dedicated SQL Pool in Manage Hub

 

To successfully complete this tutorial, you will need to have a dedicated SQL pool provisioned. Luckily, you can do this within the Manage hub. To access the Hub, locate the selector on the left-hand bar of Synapse Studio.

 

saveenr_6-1607996611810.png

 

Under Analytics pools, select SQL pools. You will notice that your workspace comes with a serverless SQL pool by default. To create a dedicated pool, select + New.

 

saveenr_7-1607996611814.png

 

Provide a name for your dedicated pool and set it to a reasonable performance tier.

 

saveenr_8-1607996611820.png

 

Select Review + create. Then, select Create. Wait for your pool to provision and come online. If you want to learn more information about the difference between dedicated SQL pools and serverless SQL pools, you can find more here.

 

Creating a Pipeline from Knowledge Center

 

You will need to access the Knowledge center. Recall that it can be accessed under the ? button in the upper right-hand corner of Synapse Studio. Knowledge center will be the first menu option.

 

saveenr_9-1607996611823.png

 

Select Browse gallery. Then, select Pipelines. Search for the Movie Analytics pipeline. Select the pipeline and then select Continue.

 

saveenr_10-1607996611825.png

 

You will need to provide three user inputs to provision the pipeline: MoviesS3, MoviesADLS, and SinkDW.

 

saveenr_11-1607996611826.png

 

On the right side of the page, under the Preview section, you also see a high-level view of the pipeline. Configure MoviesS3 first. Lower the dropdown menu and select + New. The New linked service (Amazon S3) window will open. Provide the following configuration:

 

  • Name: Any descriptive name will suffice
  • Authentication type: Access key
  • Access Key ID: Provide the Access Key ID for an AWS IAM user with read permissions to the source S3 bucket
  • Secret Access Key: The Secret Access Key for the IAM user
  • Service URL: A path-style S3 URL referencing the bucket, not the CSV file in the bucket. Find more information here.

 

saveenr_12-1607996611828.png

 

If you want to confirm your S3 connection settings, select the Test connection button in the bottom right-hand corner of the page. You can test the connection to the S3 bucket or to the moviesDB.csv file by changing the selection of the Test connection radio buttons. Either way, both tests should be successful.

 

saveenr_13-1607996611829.png

 

Once the tests succeed, select Create. You have created your first linked service in Azure Synapse. Consider linked services like connection strings. Connection information is stored in linked services to simplify the process of creating datasets from sources. To specify the MoviesADLS input, select the dropdown. When you provisioned your Synapse workspace, you provisioned an Azure Data Lake Storage Gen2 account. Selecting your account linked service will suffice for this exercise.

 

saveenr_14-1607996611830.png

 

Finally, specify SinkDW. Again, choose the default linked service. It refers to the DNS endpoint of your workspace’s dedicated SQL pool(s).

 

saveenr_15-1607996611831.png

 

Now that you have specified all User inputs, select Open pipeline. In Synapse Studio, you will now see the Movie Analytics pipeline in the Data hub. We will be editing this pipeline soon.

 

saveenr_16-1607996611835.png

 

 

Configuring the Movie Analytics Pipeline

 

Select the MoveFromS3ToADLS activity. Select the Source tab. Select Open next to the MoviesS3 dataset. Verify that the Connection information is set correctly:

 

  • Linked service: The S3 linked service you created earlier
  • File path: The location of your moviesDB.csv file in [bucket]/[directory]/moviesDB.csv format
  • First row as header: selected
  • Retain all other defaults

 

saveenr_17-1607996611840.png

 

To test that your settings are correct, select the Preview data button.

 

saveenr_18-1607996611843.png

 

Returning to the Copy data activity, select the Sink tab. Select Open next to the MoviesADLS dataset. Ensure that the following information is provided under the Connection tab.

 

  • Linked service: Choose the reference to your workspace’s ADLS Gen2 account
  • File path: sample-data/ready-demo/moviesDB.csv
  • First row as header: selected
  • Keep all other defaults

 

saveenr_19-1607996611849.png

 

You are now ready to execute the Copy data activity. Add a breakpoint to the Copy data activity and debug it, as the image below demonstrates.

 

saveenr_20-1607996611852.png

 

After a few seconds, in the Output tab, you should see that the activity succeeds.

 

saveenr_21-1607996611855.png

 

You will now execute the AggregateAndWriteToDW data flow. First, turn on data flow debug. When you do this, Azure temporarily provisions an eight core cluster.

 

saveenr_22-1607996611858.png

 

Choose OK for the Turn on data flow debug dialog.

Open the AggregateAndWriteToDW data flow.

 

saveenr_23-1607996611860.png

 

Select the MoviesADLS source. Then choose the Data preview tab. Select Refresh to see the data loaded from ADLS Gen2.

 

saveenr_24-1607996611866.png

 

Since the emphasis of this guide is not on the data handling capabilities of Azure Synapse Analytics, proceed to the SinkToDW sink. Under the Settings tab, ensure that Allow upsert is enabled.

 

saveenr_25-1607996611868.png

 

Under the Sink tab, next to the Dataset dropdown, select Open.

Under the Connection tab, open Linked service properties. For DBName, provide the name of your dedicated SQL pool as the value. Keep the Table value the same (ADF_Lab_Sink).

Return to the Integrate hub and access the Movie Analytics pipeline.

 

saveenr_26-1607996611870.png

 

Debug the pipeline. Do not set breakpoints.

In the Output tab, monitor the execution of both activities. As indicated below, the eyeglasses icon allows you to examine the debug results of a pipeline activity more in-depth.

 

saveenr_27-1607996611873.png

 

Examining the results for the AggregateAndWriteToDW data flow should reveal that 737 rows were written to the SQL dedicated pool.

 

saveenr_28-1607996611874.png

 

Feel free to Publish your new pipeline and your modifications to it. That will facilitate your knowledge of securing Azure Synapse resources.

 

Access Control in Azure Synapse Analytics

 

Now that you have created SQL pools, created and edited pipelines, and added linked services, this raises the question of securing your resources. To control access to your resources, Microsoft recommends the use of Azure Active Directory security groups, simplifying security administration to verifying that users are placed in the correct groups. For example, you might create a SynapseContributors security group for developers. Synapse Roles, like Synapse Contributor, can then be assigned to security principals, which include AAD groups. These role assignments can be limited to a certain scope.

 

saveenr_29-1607996611876.png

 

Such a role assignment can be created in the Manage hub. Locate the Access control page under the Security tab and select Add. Here are a couple of Synapse Roles that you should be aware of:

 

  • Synapse Administrator: Access to all Synapse resources and published artifacts. The recipient can grant others roles.
  • Synapse Contributor: Access to all Synapse resources except managed private endpoints (more information below) and credentials.

Note that you can change the scope of your Azure Synapse role assignments. For example, we can give a certain user permissions to use the Synapse workspace’s managed identity (WorkspaceSystemIdentity). Note that the Synapse Roles you can assign to the security principal are reduced–in this example, only the Synapse Administrator and Synapse Credential User (preview) roles can leverage the managed identity. Managed identities provide simple and secure authentication to services that use Azure Active Directory for authentication, like Azure Data Lake.

 

saveenr_30-1607996611877.png

 

Azure provides even more capabilities to govern the access and administration of Azure Synapse Analytics. For users to create compute resources, including SQL pools, they must have at least the Azure Contributor role on the workspace. This role assignment can be added through the Access control (IAM) panel of the Synapse workspace Azure resource.

 

saveenr_31-1607996611879.png

 

Lastly, we will discuss SQL permissions. If you have the Synapse Administrator role, you are already a db_owner on the serverless pool which is provided with your workspace, Built-in. Suppose you are a developer on the project. The Synapse Administrators can give you access through the SynapseContributors group using the following commands:

 

CREATE DATABASE analyticsdb;
use analyticsdb

CREATE USER [SynapseContributors] FROM EXTERNAL PROVIDER;
ALTER ROLE db_owner ADD MEMBER [SynapseContributors];

 

SynapseContributors is an AAD security principal (group). To enter this script, navigate to the Develop hub. Then, create a new SQL script.

 

saveenr_32-1607996611881.png

 

The situation is slightly different for the dedicated SQL pool. Instead of using the ALTER ROLE T-SQL statement, you use the sp_addrolemember stored procedure.

 

CREATE USER [SynapseContributors] FROM EXTERNAL PROVIDER;
EXEC sp_addrolemember 'db_owner', 'SynapseContributors';

 

In the Connect to bar, the dedicated pool is specified and the specific database used in the pipeline earlier has been selected.

 

saveenr_33-1607996611883.png

 

Azure Synapse Analytics provides a whole host of access control solutions. To learn more, please consult this. We will conclude this post with a discussion of network security in Azure Synapse Analytics.

 

Network Security in Azure Synapse Analytics

 

Azure Synapse Analytics allows you to provision a managed virtual network for your workspace. With the managed VNet, administators do not need to handle the burden of configuring traffic management rules, since that configuration is handled by Synapse. Moreover, using the managed VNet provides support for managed private endpoints. These endpoints are created in the managed VNet and enable access to Azure services. Communication between private endpoints and Azure resources occurs over private links, which transfer data through Microsoft’s network infrastructure.

 

saveenr_34-1607996611893.png

 

Examine the image above from the Microsoft documentation. Here are a couple of important ideas you can pull from this diagram:

 

  • The SQL pools exist outside of the managed VNet, but they can be referenced using managed private endpoints
  • Private endpoints can reference Azure resources within the same Azure tenant and even resources in subscriptions outside of the AAD tenant

Clearly, using managed virtual networks with your Synapse workspaces avoids data exfiltration. You can manage private endpoints in the Manage hub, as seen below. If you provision you workspace with support for managed VNets, endpoints referencing both the SQL pool and the serverless SQL pool are automatically created.

 

saveenr_35-1607996611901.png

 

To reap the benefits of managed VNets in Synapse workspaces, you will need to provision your workspace with support for the managed VNet–you cannot add this after the workspace is provisioned.

 

saveenr_36-1607996611907.png

 

In this image, managed virtual network support is enabled for this Synapse workspace deployment. Moreover, since the Allow outbound data traffic only to approved targets is set, the AD tenants of target resources will need to be added.

 

Clean-up

 

In this post, you created a dedicated SQL pool. To locate the pool, navigate to Manage hub, and select SQL pools below Analytics pools. Select the three dots next to the dedicated pool you created.

 

saveenr_37-1607996611909.png

 

You can either Pause the SQL pool if it is running, or Delete the pool. Note that if you pause the dedicated pool, you will release the compute node(s), thus halting compute billing. Alternatively, you can delete the SQL pool.

 

Quick get started with Azure Synapse and try this tutorial with these resources:

 

 

saveenr_38-1607996611919.jpeg

 

 

2 Comments
Occasional Contributor

Getting an error with connecting to the file in S3 (however, the linked service connection to the bucket was successful). Any clues?

The operation on file MoviesDB.csv under directory <folder> is failed due to exception. A WebException with status TrustFailure was thrown. The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. The remote certificate is invalid according to the validation procedure. . Activity ID: ab4dddd8-92e8-46c1-8659-411ed3a3ec1f

 

verargulla_0-1620822789359.png

 

Senior Member

@verargulla 

We've found that by not setting the service URL to the s3 bucket we have overcomed the TLS error

Version history
Last update:
‎Dec 16 2020 09:52 AM
Updated by: