Understanding Azure Synapse Private Endpoints
Published May 06 2021 01:51 PM 72.5K Views
Microsoft

In the past year or so, I've been knee-deep in Azure Synapse. I have to say, it's been a super popular platform in Azure. Many clients are either migrating to Azure Synapse from SQL Server, data warehouse appliances or implementing net new solutions on Synapse Analytics.

 

One of the most asked questions or subjects that are top of mind revolves around security. As company move sensitive data to the cloud, checks and balances need to be in place to meet security requirements and the first thing that comes up is: does my data flow through the internet?

 

When it comes down to private endpoints, virtual networks, private and public IPs, things start getting complex...

 

So let’s try to make sense of all this.

 

Note, I will not be doing a deep dive into networking as there are people that are more knowledgeable on this subject. But, I will try to clarify to the best of my abilities

 

Network security

In order to expand on the topic of security and network traffic, we need to dive into network security.

 

This topic can be broken down in a few categories:

 

  • Firewall
  • Virtual network
  • Data exfiltration
  • Private endpoint

 

Firewall

Bing defines firewall as "... a security device that monitors and filters incoming and outgoing network traffic based on an organization's previously established security policies. ... A firewall's main purpose is to allow non-threatening traffic in and to keep dangerous traffic out."

 

In the context of Azure Synapse, it will allow you to grant or deny access to your Synapse workspace based on IP addresses. This can be effectively used to block traffic to your workspace via the internet. Normally, firewalls would control both outbound and inbound traffic, but in this case, it's inbound only.

 

I'll cover outbound later when talking about managed virtual network and data exfiltration.

 

When creating your workspace, you have the option to allow ALL IP address through.

 

IP FilteringIP Filtering

 

If you enable this option, you'll end up with the following rule added:

 

IP Filtering RulesIP Filtering Rules

 

Note, if you don't enable this, you will NOT be able to connect to your workspace right away. Best to keep it enabled, then go back and modify / tweak it.

See this documentation from Microsoft on Synapse workspace IP Firewall rules

 

Virtual Network

Virtual network will give you network isolation against other workspaces. This is accomplished by enabling the "Enable managed virtual network" option during the deployment of the workspace.

 

Enable Managed Virtual NetworkEnable Managed Virtual Network

 

Alert, you can only enable this option during the creation of your workspace.

The great thing about this is it gives you all the benefits of having your workspace in a virtual network without the need to manage it. Look it up here for more details on benefits.

 

Data Exfiltration

Another benefit of enabling managed virtual network and private endpoints, which we're tackling next, is that you're now protected against data exfiltration.

 

Definition: occurs when malware and/or a malicious actor carries out an unauthorized data transfer from a computer. It is also commonly called data extrusion or data exportation.

In the context of Azure, protection agains data exfiltration guards against malicious insiders accessing your Azure resources and exfiltrating sensitive data to locations outside of your organization’s scope.

 

In addition to enabling the managed virtual network option, you can also specify which Azure Active Directory tenant your workspace can communicate with.

 

Specify AD TenantSpecify AD Tenant

 

Check out this documentation on data exfiltration with Synapse

 

Private Endpoints

Microsoft defines Private Endpoints as "Azure Private Endpoint is a network interface that connects you privately and securely to a service powered by Azure Private Link. Private Endpoint uses a private IP address from your VNet, effectively bringing the service into your VNet."

 

In short, you can access a public service using a private endpoint.

 

Every Synapse workspace comes with a few endpoints which are used to connect to from various applications:

 

Synapse workspace endpointsSynapse workspace endpoints

 

Dedicated SQL endpoint Used to connect to the Dedicated SQL Pool from external applications like Power BI, SSMS
Serverless SQL endpoint Used to connect to the Serverless SQL Pool from external applications like Power BI, SSMS
Development endpoint This is used by the workspace web UI as well as DevOps to execute and publish artifacts like SQL scripts, notebook.
workspace web URL Used to connect to the Synapse Studio web UI

 

If we take the dedicated SQL endpoint for example and we add private endpoint. What's basically happening is when you connect to it, your request goes through a redirection to a private IP.

 

If you do a nslookup to the SQL endpoint, you can see it routes to the private endpoint:

 

 

nslookup synapseblog-ws.sql.azuresynapse.net

 

 

Traceroute OutputTraceroute Output

 

Managed Private Endpoints

Synapse uses a managed VNET / Subnet (i.e. not a customer’s one) and exposes private endpoints in customers’ vnets as needed. This is the reason you never pick a VNET in the wizard during the creation.

 

Since that VNET belongs to Microsoft and is managed, it is isolated by itself. It therefore requires private endpoints from other PaaS to be created into it.

 

It is similar to how the managed VNET feature of Azure Data Factory operates

I have a diagram outlining all this later.

 

When you create a new Synapse workspace, you'll notice in the Synapse Studio, under the manage hub, security section and managed private endpoint that 2 private endpoints were created by default.

 

Managed Private EndpointManaged Private Endpoint

 

Note, for the curious that noticed the private endpoint blade in Azure portal for the Synapse resource and wondering what that's about, I'll cover that next.

When you deploy a Synapse workspace in a managed virtual network, you need to tell Synapse how to communicate with other Azure PaaS (Platform As A Service)

 

Therefore, these endpoints are required by Synapse's orchestration (the studio UI, Synapse Pipeline, etc.) to communicate with the 2 SQL pools; dedicated and serverless... This will make more sense once you see the detailed architecture diagram.

 

:police_car_light: Alert, one common issue I see people facing is their Spark pools not being able to read files on the storage account. This is because you need to manually create a managed service endpoint the storage account.

Check out this documentation to see how: How to create a managed private endpoint

 

Private Endpoint Connections

Now that we've covered managed private endpoints, you're probably asking yourself why you have a private endpoint connection blade in the Azure portal for your Synapse workspace.

 

Private Endpoint Blade in PortalPrivate Endpoint Blade in Portal

 

Where managed private endpoints allows the workspace to connect to other PaaS services outside of its managed virtual network, private endpoint connections allow for everyone and everything to connect to Synapse endpoints using a private endpoint.

 

You will need to create a private endpoint for the following:

   
Dedicated SQL endpoint Select the SQL sub resource during the creation.
Serverless SQL endpoint Select the SqlOnDemand sub resource during the creation
Development endpoint Select the DEV sub resource during the creation.

 

You might've noticed in the list of private endpoint, we only had 3 of them while your workspace has 4 endpoints. That's because the studio workspace web URL will need a Private Link Hub to setup the secured connection.

 

Check out this document for instructions on how to set this up.

 

Connect to Azure Synapse Studio using Azure Private

 

Time to put it all together!

Now that we've covered firewalls, managed private endpoint, private endpoint connections and private link hub, let’s take a look how it looks when you deploy a secured end to end Synapse workspace.

 

Azure Synapse Detailed DiagramAzure Synapse Detailed Diagram

This architecture assumes the following:

 

  1. You have two storage accounts, one for the workspace file system (this is required by Synapse deployment), the another, to store any audits and logs.

  2. For each of the storage accounts, you've disabled access from all networks and enabled the firewall to block internet traffic.

 

Now let's break this diagram down.

 

Synapse workspace

Synapse Workspace architectureSynapse Workspace architecture

 

  1. The virtual network created as part of the managed vNet workspace deployment. This vNet is managed by Microsoft and cannot be seen in the Azure portal's resource list.

  2. It contains the compute for the self-hosted integration runtime and the compute for the Synapse Dataflow.

  3. Any spark pools will create virtual machines behind the scene. These will also be hosted inside the managed virtual network (vNet).

  4. The Serverless SQL pool is a multi-tenant service and will not be physically deployed in the vNet but you can communicate with the service via private endpoints.

  5. Same as the Serverless SQL pools, it's a multi-tenant service and will not be physically deployed in the vNet but will communicate with the service via private endpoints.

     

    Remember the two managed private endpoints created when you deployed your new Synapse workspace? This is why they're created.

 

Synapse Studio

Actual Synapse Workspace architectureActual Synapse Workspace architecture

 

  1. The workspace studio UI is a single page application (SPA) and is created as part of the Synapse workspace deployment.

  2. Utilizing an Azure Synpase Link Hub, you're able to create a private endpoint into the customer's owned vNet.

  3. Users can connect to the Studio UI using this private endpoint.

  4. Executions like notebooks or SQL scripts made from the Studio web interface will submit commands via the DEV private endpoint and ran on the appropriate pool.

     

    Note, the web app for the UI will not be visible and is managed by Microsoft

 

Storage Accounts and Synapse

Storage Accounts Private EndpointsStorage Accounts Private Endpoints

 

  1. For each workspace created, you will need to specify a storage account / file system with hierarchical name space enabled in order for Synapse to store its metadata.

  2. When your storage account is configured to limit access to certain vNets, endpoints are needed to allow the connection and authentication. Similar to how Synapse needs private endpoints to communicate with the storage account, any external systems or people that need to read or write to the storage account will require a private endpoint.

  3. Every storage accounts that you connect to your Synapse workspace via linked services will need a managed private endpoint like we mentioned previously. This applies to each service within the managed vNet.

  4. Optional: You can use another storage account to store any logs or audits.

     

    Note, logs and audits cannot use storage accounts with hierarchical namespace enabled. Hence the reason why we have 2 storage accounts in the diagram.
  5. The SQL pools, which are multi-tenant services, talk to the Storage over public IPs but use trusted service based isolation. However, going over public IPs doesn't mean data is going to the internet. Azure networking implements cold potato routing, so traffic stays on Azure backbone as long as the two entities communicating are on Azure. This can be configured within the storage account networking configuration.

     

    Storage Account Trusted MSIStorage Account Trusted MSI

     

    Or can also be set during the Synapse workspace creation.

     

    Storage Account Trusted MSIStorage Account Trusted MSI

     

    Definition: In commercial network routing between autonomous systems which are interconnected in multiple locations, hot-potato routing is the practice of passing traffic off to another autonomous system as quickly as possible, thus using their network for wide-area transit. Cold-potato routing is the opposite, where the originating autonomous system holds onto the packet until it is as near to the destination as possible.

 

Private endpoints in customer-owned vNet

Like I mentioned previously for the storage accounts, private endpoints need to be created in the customer's vNet for the following:

 

  1. Dedicated SQL Pool
  2. Serverless SQL Pool
  3. Dev

 

Like you can see here:

 

Private Endpoints created in the portalPrivate Endpoints created in the portal

 

Conclusion

Hope this helps clarifying some of the complexities of deploying a secured Synapse workspace and that you understand the nuances of each private endpoint.

 

The last piece of the puzzle that can cause issues would be authentication and access control.

 

I can't recommend strongly enough that you go through this documentation which outlines all the steps you need to take.

 

How to set up access control for your Synapse workspace

 

Thanks!

5 Comments
Copper Contributor

@Benjamin Leroux what would be a recommended method to connect Synapse to services like Event Hub? Or any other Azure service that is not natively supported by Synapse through "Managed Private Endpoint"

Copper Contributor

Hi @Benjamin Leroux 

your blog post from previously this year have helped my understanding of the setup - thanks for the same!

I'm working on a typical hub-spoke company setup where part of the infrastructure is on prem and the resources would need to be discoverable across networks. 

For this to work you need to also setup a (redundant/resiliant) DNS forwarder (on the hub) and on the on prem existing DNS selectively forward some DNS queries to this DNS forwarder. In addition you would need to add Azure Private Zone to the private links and keep those associated to the VNET/VNETs that should be able to talk to Synapse.

The typical setup is described here: https://docs.microsoft.com/en-us/azure/private-link/private-endpoint-dns (picture below pulled from there), and to complete the setup as you suggested the private zones required would be

  • privatelink.sql.azuresynapse.net
  • privatelink.dev.azuresynapse.net
  • privatelink.web.azuresynapse.net
  • privatelink.dfs.core.windows.net
  • privatelink.blob.core.windows.net

Could you add information around this to your blog?

MatsElfving_0-1624535260028.png

Microsoft

This is a great article Ben, thanks for putting it all in one place, with great pictures too.  

One thing to note for large customers that want to use private networking everywhere, the name for the Synapse Studio Portal private endpoint is just "web.azuresynapse.net", instead of the instance name like the other resources.  For folks with integrated DNS across Azure, this means that registering this in a spoke landing zone for the first Synapse workload would in effect mean that the second Synapse workload that wanted private endpoints for the studio would be unable to register that DNS name twice (since it isn't unique).  

Two options for that, the first being just use the public endpoint for the Synapse Studio.  This is probably the easiest and best way, with very low risk.

- it's control plane access,

- requires Azure RBAC, Azure AD MFA or conditional access, etc.

- threat model is very similar to the Azure Portal itself

 

The other option might be to have a private endpoint hub for the Synapse Studio created centrally in the hub network, and allow any Synapse based workload to use that instead.  Would need a bit more planning, firewall rule consideration, etc.

 

Awesome article Ben.

Copper Contributor

Hi @Vallentyne,

 

If we allow public endpoint for the workspace, isn't it allowing public network access? That would defeat the purpose of isolating the resource with private endpoint, right?

Copper Contributor

Benjamin

I currently have to use Self-Hosted Integration Runtime to connect to on-premise SQL Server from Azure Synapse WorkSpace(Private End Point enabled)
I heard that there is new service "Azure Private DNS Resolver" so was wondering if that is configured would we still require to use Self-Hosted Integration Runtime ?
There is Express Route Connectivity in place.

Co-Authors
Version history
Last update:
‎Apr 27 2021 12:44 PM
Updated by: