Blog Post

Azure Synapse Analytics Blog
4 MIN READ

Synapse pipelines storage event trigger security deep dive

vengat83MSFT's avatar
vengat83MSFT
Icon for Microsoft rankMicrosoft
Apr 03, 2023

Author(s):

Prashant Atri is a Senior Cloud Solution Architect in Global Partner Solutions (GPS), US team.

Vengatesh Parasuraman is a Senior Program Manager in Azure Synapse Customer Success Engineering (CSE) team.

 

ADF and Synapse Pipelines offer a feature that allows pipeline execution to be triggered based on various events, such as storage blob creation or deletion. This can be used by customers to implement event-driven pipeline orchestration. If you want to learn how to create storage event triggers in ADF/Synapse pipelines, you can refer to this document.

 

In most cases, customers protect their storage account connections using firewall rules or private links. This can make it challenging to implement storage event triggers in ADF or Synapse Pipelines. In this blog post, we will address some common security questions and challenges that may arise during storage event trigger creation.

 

Here are some frequently asked questions that arise while setting up this trigger for protected storage accounts:

  1. Why do we need to enable “Allow Azure services on the trusted services list” option on the storage account during storage event trigger creation?
  2. Can we disable “Allow Azure services on the trusted services list” option after the storage event trigger creation?
  3. Can I use the resource instance rule to allow access to the storage account instead?
  4. How does the communication happen between the Storage account, Event Grid and ADF/Synapse pipelines and will my triggers continue to work even if I disable public access to my Synapse workspace?

 

Questions 1 and 2:

Let’s look at what happens when we create a storage event trigger in ADF/Synapse. The following diagram shows the underlying communications that happens between various services on trigger creation:

 

 

 

When creating a storage event trigger in ADF/Synapse Pipeline, the service (ADF/Synapse) performs a permissions check to ensure that the user attempting to create the Storage Event trigger has appropriate access to the relevant storage account. If the permission check fails, trigger creation also fails. There is no incoming connection request to the storage account at this point.

 

While creating and/or starting and stopping the trigger, the EventGrid communicates to the storage account to create the EventGrid subscription - this is where "Allow Azure services on the trusted services list" on the storage account comes into play, as the EventGrid makes an incoming connection to the storage account at this point.

 

After the trigger has been created and started successfully, for security or compliance requirements, customers can disable "Allow Azure services on the trusted services list" option on the storage account. But there are some scenarios, where you would need to enable it again, for example:

  1. Any new storage event trigger creation
  2. Trigger deletion/re-creation
  3. Start/Stop existing storage event triggers

 

Customers can automate the overall trigger deployment via custom scripts which manages check/uncheck "Allow Azure services on the trusted services list" option temporarily during any of the above three scenarios.

 

What happens if we disable "Allow Azure services on the trusted services list"?

If "Allow Azure services on the trusted services list" option is disabled on the storage account, it will result in the error code 400/500 when attempting to create the trigger or start/stop the trigger.

 

The attempt to configure storage notifications for the provided storage account xxxxxxx failed. Please ensure that your storage account meets the requirements described at https://aka.ms/storageevents. The error is Failed to retrieve credentials for request=RequestUri=https://management.azure.com/subscriptions/xxxxxxx/resourceGroups/xxxxxx-rg/providers/Microsoft.Storage/storageAccounts/xxxxxxx/listAccountSas, Method=POST, response=StatusCode=400, StatusDescription=Bad Request, IsSuccessStatusCode=False, Content=System.Net.HttpWebResponse, responseContent={"error":{"code":"InvalidValuesForRequestParameters","message":"Values for request parameters are invalid: keyToSign."}}

 

 

Question 3 – Can I use the resource instance rule to allow access to the storage account instead?

Trusted access to the storage account for the resources based on its managed identity would not work with system managed event grid topic. Because the event grid system topic only gets created after the trigger is created or started. And only then can we enable the Managed Identity of the event grid system topic. To configure the resource instance exception on the storage account, event grid system topic and its managed identity must exist already. By design, because of this circular dependency, the storage event trigger creation works only by "Allow Azure services on the trusted services list" option.

 

Even if you select all system topics in event grid for the resource instance exception, it will not work for creating storage event triggers:

 

 

 

Question 4 – How does the communication happen between the Storage account, Event Grid and ADF/Synapse pipelines and will my triggers continue to work even if I disable public access to my Synapse workspace?

 

The following diagram explains the communication flow between the various services after the trigger creation:

 

 

 

The storage will push the event to EventGrid, the EventGrid will push the event to Data Factory Control Plane (that is the web hook url you see on the event subscription - xxx.svc.datafactory.azure.com etc.). Data Factory Control Plane is the subscriber to your events, and not your specific Synapse workspace. The EventGrid is not communicating directly to your Synapse workspace and running your pipeline - so you can very well disable public access to your Synapse workspace, as EventGrid is not going to hit your workspace dev endpoint. The internal Data Factory backend service, which receives this event is responsible for running your pipeline, and that communication is not via external http dev endpoint of your Synapse workspace, it’s an internal communication.

 

Supporting documentation:

Create event-based triggers - Azure Data Factory & Azure Synapse

Configure Azure Storage firewalls and virtual networks 

 

Updated Mar 31, 2023
Version 1.0
  • versydney's avatar
    versydney
    Iron Contributor

    This is very insightful and helps us understand why these settings are needed (or not needed).

  • richardafraser's avatar
    richardafraser
    Copper Contributor

    As far as i can see this only works with Blobs. It would be ideal for us to have triggers on Azure Files events. Our users drop files into their fileshare, to have the ability for this to automatically trigger the load into Synapse would save on a lot of coding. Do we know if this feature is being looked at in the future?

  • richardafraser Thanks for the feedback. The feature is blob event trigger at present. We will look into adding more functionalities such as file events also in the future.

  • Federico2's avatar
    Federico2
    Copper Contributor

    Thanks for the detailed explanation. That helped me solve the problem I had when trying to create a storage trigger.

    I was previously using what I think should be a more relaxed security set-up, which is allowing public access to the storage account. However, that made the publishing in SA fail, until I changed the set-up to the more restrictive case of Firewall rules and "Allow Azure services on the trusted services list to access this storage account" enabled. 

    Do you have an explanation for why it did not work when public access was allowed?