Author: Luke Moloney is a Senior Program Manager in Azure Synapse Customer Success Engineering (CSE) team.
Data Exfiltration Protection (DEP) is a feature that enables additional restrictions on the ability of Azure Synapse Analytics to connect to other services – enabling you to further secure your Azure Synapse Analytics deployment. There are a couple of key things to know about DEP:
This article will focus specifically on how DEP impacts the use of Synapse. Azure Data Factory does not currently support deployment with DEP.
DEP can only be enabled at the creation of an Azure Synapse Analytics workspace. It is enabled through the selection of ‘Allow outbound data traffic only to approved targets’, this option is only possible when creating a workspace with the ‘Managed virtual network’ option enabled. Both options are selected within the networking tab of Azure Synapse Analytics Workspace creation. These parameters are also available when programmatically deploying an Azure Synapse Analytics workspace (e.g. ARM Template, CLI ). You can learn more about creating an Azure Synapse Analytics workspace with DEP at Create a workspace with data exfiltration protection enabled - Azure Synapse Analytics.
Before we discuss how DEP applies to Synapse Pipelines, it is important to level-set on some Synapse Pipelines specific concepts – if you are familiar with Synapse Pipelines or Azure Data Factory you can skip over this section and jump to Synapse Pipeline connectivity without DEP enabled.
For a more generalized introduction to Synapse Pipelines check out this doc article.
Synapse Pipelines enables users to connect to a range of different data services, through what is called a Linked Service. Synapse Pipelines supports a wide range of connectors to different services including:
A full list of the supported connectors is available with this link.
When a user creates a Linked Service they must choose an Integration Runtime which will execute this activity. There are two types of Integration Runtimes;
It’s important to note that there are some differences offered by AIRs and SHIRs – most notably that Data Flows can only be executed on AIRs. For more information including some of the feature differences please read https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime.
It should also be noted that some Linked Services can only be used with certain Integration Runtime types, read Pipelines and activities - Azure Data Factory & Azure Synapse for more details.
Without DEP enabled, it is possible for users who have appropriate privileges within an Azure Synapse Analytics workspace to be able to run pipelines which can connect to a range of different services (through Linked Services), using an Azure Integration Runtime.
Therefore, without DEP, an appropriately permissioned user may be able to read data from or write data to a Linked Service in a way which violates an organizations policy. This could occur due a compromised account, a malicious user or lack of awareness of an organizations policy.
It’s important to note that DEP is only layer of protection that applies to Azure Synapse Analytics review the Azure Synapse Analytics Security Whitepaper for more information on the multiple layers of security within Azure Synapse Analytics.
With DEP enabled, the behavior outlined above changes. DEP enables you to limit connections from Synapse Pipelines to a service in specified Azure AD Tenants connecting through managed private endpoints, when using the Azure Integration Runtime.
By default, the Azure AD tenant within which the Azure Synapse Analytics workspace is created is allowed and does not need to be added for connectivity within the same Azure AD tenant to work. You can also configure additional Azure AD tenants you would like to allow connections to, this can be done at the point of Workspace creation or at any point after that.
When using the Azure Integration Runtime with DEP enabled, Linked Service connection (that is to say connections to other services) must occur through managed private endpoints. The services which are supported within Azure Synapse Analytics managed private endpoints (at the time of) are:
For more information as to how to set-up a managed private endpoint within an Azure Synapse Analytics workspace check out this link. It should be noted that this process will require appropriate permissions within Azure Synapse Analytics and within the service you are making the connection to. In Azure Synapse Analytics users will require ‘workspaces/managedPrivateEndpoint/write, delete’ permissions, which the Synapse Administrator and Synapse Linked Data Manager roles have with Synapse RBAC.
Given DEP places restrictions on what and how connections are made to other services, this necessarily means that those Linked Services which do not support managed private endpoints cannot be connected to an Azure Integration Runtime.
This table provides a high-level summary of whether a Linked Service will work within Azure Synapse Analytics with DEP enabled.
|
Service is not supported with Synapse Managed Private Hub |
Service is supported within Synapse Managed Private Hub |
Outside an approved Azure AD tenant |
Not accessible |
Not accessible |
Within an approved Azure AD Tenant |
Not accessible |
Accessible once a managed private endpoint is created. |
Some common scenarios what will not work when using the Azure Integration Runtime include:
It’s important to note that working around the constraints of DEP should be something that is worked through as part of any security review to ensure that your Azure Synapse Analytics deployment remains compliant with your organizational policies and requirements.
The primary way to address the constraints of DEP when using, is to leverage the Self-Hosted Integration Runtime. As a Self-Hosted Integration Runtime is deployed on infrastructure you manage, this allows you / your organization to fully control – through traditional networking controls (e.g. Proxy, outbound Firewall) – which endpoints it can connect to. DEP does not impact the behavior of Self-Hosted Integration Runtimes.
Therefore, if you need to connect to endpoints which are not available when using DEP, you can choose to execute that activity on a Self-Hosted Integration Runtime instead of the Azure Integration Runtime. The abilities to log, control and limit a Self-Hosted Integration Runtime means that this should ensure that your organization’s compliance, regulatory or other policy requirements are able to be met.
If you need the protections that DEP provides – then yes of course you should enable DEP. If you don’t need those guarantees, then you should very carefully consider the constraints DEP will impose on your Azure Synapse Analytics workspace and whether they make sense given the scope and vision for your Azure Synapse Analytics project.
DEP imposes a particular set-up of Network security controls, within Azure Synapse Analytics network security is simply one of many layers of security. You can find out more information about how Azure Synapse Analytics works with the other layers in our security whitepaper available here. For many customers these constraints are not worth the advantages and a combination of appropriate source control, release process and RBAC controls meet their needs.
As you can see DEP can provide additional protections for your Azure Synapse Analytics deployments, but these protections come with capability trade-offs. You can find out more information about DEP at Data exfiltration protection for Azure Synapse Analytics workspaces - Azure Synapse Analytics | Micr....
My colleague Vengatesh has a number of videos available on the Azure Synapse Analytics YouTube channel which can further your learnings.
For those of you just getting started with Azure Synapse Analytics I would highly recommend our Azure Synapse Success by Design guidance, which includes a great Proof of concept playbook and our implementation success methodology.
Finally – we’d love for you to leave a comment on how you found this blog, any experiences you have had with DEP and any future topics you'd like to be see covered.
Our team publishes blog(s) regularly and you can find all these blogs here: https://aka.ms/synapsecseblog
For deeper level understanding of Synapse implementation best practices, please refer our Success By Design (SBD) site: https://aka.ms/Synapse-Success-By-Design
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.