Synapse pipelines storage event trigger security deep dive
Published Apr 03 2023 08:00 AM 6,520 Views
Microsoft

CSEBlog_Logo.png

Author(s):

Prashant Atri is a Senior Cloud Solution Architect in Global Partner Solutions (GPS), US team.

Vengatesh Parasuraman is a Senior Program Manager in Azure Synapse Customer Success Engineering (CSE) team.

 

ADF and Synapse Pipelines offer a feature that allows pipeline execution to be triggered based on various events, such as storage blob creation or deletion. This can be used by customers to implement event-driven pipeline orchestration. If you want to learn how to create storage event triggers in ADF/Synapse pipelines, you can refer to this document.

 

In most cases, customers protect their storage account connections using firewall rules or private links. This can make it challenging to implement storage event triggers in ADF or Synapse Pipelines. In this blog post, we will address some common security questions and challenges that may arise during storage event trigger creation.

 

Here are some frequently asked questions that arise while setting up this trigger for protected storage accounts:

  1. Why do we need to enable “Allow Azure services on the trusted services list” option on the storage account during storage event trigger creation?
  2. Can we disable “Allow Azure services on the trusted services list” option after the storage event trigger creation?
  3. Can I use the resource instance rule to allow access to the storage account instead?
  4. How does the communication happen between the Storage account, Event Grid and ADF/Synapse pipelines and will my triggers continue to work even if I disable public access to my Synapse workspace?

 

Questions 1 and 2:

Let’s look at what happens when we create a storage event trigger in ADF/Synapse. The following diagram shows the underlying communications that happens between various services on trigger creation:

 

vengat83MSFT_0-1679686416730.png

 

 

When creating a storage event trigger in ADF/Synapse Pipeline, the service (ADF/Synapse) performs a permissions check to ensure that the user attempting to create the Storage Event trigger has appropriate access to the relevant storage account. If the permission check fails, trigger creation also fails. There is no incoming connection request to the storage account at this point.

 

While creating and/or starting and stopping the trigger, the EventGrid communicates to the storage account to create the EventGrid subscription - this is where "Allow Azure services on the trusted services list" on the storage account comes into play, as the EventGrid makes an incoming connection to the storage account at this point.

 

After the trigger has been created and started successfully, for security or compliance requirements, customers can disable "Allow Azure services on the trusted services list" option on the storage account. But there are some scenarios, where you would need to enable it again, for example:

  1. Any new storage event trigger creation
  2. Trigger deletion/re-creation
  3. Start/Stop existing storage event triggers

 

Customers can automate the overall trigger deployment via custom scripts which manages check/uncheck "Allow Azure services on the trusted services list" option temporarily during any of the above three scenarios.

 

What happens if we disable "Allow Azure services on the trusted services list"?

If "Allow Azure services on the trusted services list" option is disabled on the storage account, it will result in the error code 400/500 when attempting to create the trigger or start/stop the trigger.

 

The attempt to configure storage notifications for the provided storage account xxxxxxx failed. Please ensure that your storage account meets the requirements described at https://aka.ms/storageevents. The error is Failed to retrieve credentials for request=RequestUri=https://management.azure.com/subscriptions/xxxxxxx/resourceGroups/xxxxxx-rg/providers/Microsoft.Stor..., Method=POST, response=StatusCode=400, StatusDescription=Bad Request, IsSuccessStatusCode=False, Content=System.Net.HttpWebResponse, responseContent={"error":{"code":"InvalidValuesForRequestParameters","message":"Values for request parameters are invalid: keyToSign."}}

 

 

Question 3 – Can I use the resource instance rule to allow access to the storage account instead?

Trusted access to the storage account for the resources based on its managed identity would not work with system managed event grid topic. Because the event grid system topic only gets created after the trigger is created or started. And only then can we enable the Managed Identity of the event grid system topic. To configure the resource instance exception on the storage account, event grid system topic and its managed identity must exist already. By design, because of this circular dependency, the storage event trigger creation works only by "Allow Azure services on the trusted services list" option.

 

Even if you select all system topics in event grid for the resource instance exception, it will not work for creating storage event triggers:

 

vengat83MSFT_1-1679686416740.png

 

 

Question 4 – How does the communication happen between the Storage account, Event Grid and ADF/Synapse pipelines and will my triggers continue to work even if I disable public access to my Synapse workspace?

 

The following diagram explains the communication flow between the various services after the trigger creation:

 

vengat83MSFT_2-1679686416750.png

 

 

The storage will push the event to EventGrid, the EventGrid will push the event to Data Factory Control Plane (that is the web hook url you see on the event subscription - xxx.svc.datafactory.azure.com etc.). Data Factory Control Plane is the subscriber to your events, and not your specific Synapse workspace. The EventGrid is not communicating directly to your Synapse workspace and running your pipeline - so you can very well disable public access to your Synapse workspace, as EventGrid is not going to hit your workspace dev endpoint. The internal Data Factory backend service, which receives this event is responsible for running your pipeline, and that communication is not via external http dev endpoint of your Synapse workspace, it’s an internal communication.

 

Supporting documentation:

Create event-based triggers - Azure Data Factory & Azure Synapse

Configure Azure Storage firewalls and virtual networks 

 

4 Comments
Version history
Last update:
‎Mar 31 2023 10:41 AM
Updated by: