Blog Post

Azure PaaS Blog
3 MIN READ

Working with ADF Storage Event Trigger Over SFTP

Amrinder_Singh's avatar
Oct 24, 2022

 

Storage Event Trigger in Azure Data Factory is the building block to build an event driven ETL/ELT architecture (EDA). Data Factory's native integration with Azure Event Grid let you trigger processing pipeline based upon certain events. Currently, Storage Event Triggers support events with Azure Data Lake Storage Gen2 and General-Purpose version 2 storage accounts, including Blob Created and Blob Deleted.

 

Event-driven architecture (EDA) is a common data integration pattern that involves production, detection, consumption, and reaction to events. Data integration scenarios often require customers to trigger pipelines based on events happening in storage account, such as the arrival or deletion of a file in Azure Blob Storage account. Data Factory and Synapse pipelines natively integrate with Azure Event Grid, which lets you trigger pipelines on such events.

 

The below document and blogs talk about how you can create ADF event trigger that run ADF pipeline in response to Azure Storage events.

Create event-based triggers - Azure Data Factory & Azure Synapse | Microsoft Learn

Create ADF Events trigger that runs an ADF pipeline in response to Azure Storage events. - Microsoft Community Hub

Storage Event Trigger - Permission and RBAC setting - Microsoft Community Hub

 

While the basic architecture and settings remains the same, in this blog we will be mainly focus on SFTP related storage events and the configuration changes that you make currently to trigger an ADF pipeline.

 

Now, as mentioned that the basic steps remain the same when it comes to the creation of trigger. We need to provide the details such as storage account name (SFTP enabled one), container name along with the pattern (blob start/blob end) to match the triggering conditions.

Once you perform the above step, it automatically creates Event grid configurations automatically at the backend. Below is how the configuration will look based on the filtering patten that you had selected while creating the trigger.

If we look at the data API’s that gets added, currently, we will see mainly the Blob Storage REST API’s and the Data Lake gen2 REST API’s. However, for SFTP storage, there are different set of REST API’s such as SFTPCreate, SFTPCommit, SFTPRename etc.  You can monitor the REST APIs via diagnostic logging as well:

Monitoring Azure Blob Storage | Microsoft Learn

 

Based on these API’s the corresponding SFTP events are generated as discussed in the below link:

https://learn.microsoft.com/en-us/azure/event-grid/event-schema-blob-storage?tabs=event-grid-event-schema#sftp-events

 

Now with the default configuration that gets added to the filtering section, if we try to perform operations via SFTP REST API’s, although the event gets generated, the event will tend to get dropped as based on the filtering conditions, it will not find the corresponding data API. As a result, the trigger won’t execute pipeline ahead. Hence, we need to add the SFTP specific REST API’s to data.api section to match the triggering conditions of the event such as below:

Once this has been added and you try to upload a blob via SFTP REST API’s, it will make the ADF pipeline to trigger ahead successfully.

 

Hope this helps!

Published Oct 24, 2022
Version 1.0

8 Comments

  • Hi Shivabamini - Ideally you should be getting the Advance Filter option in the EH configuration after creating the trigger. If not, I would recommend raising a Support Case to troubleshoot the issue further.

  • Shivabamini's avatar
    Shivabamini
    Copper Contributor

    I was not able to see Advanced filters section when I am creating the storage event trigger.

  • Stephen_James's avatar
    Stephen_James
    Copper Contributor

    We have found the SFTP trigger stops working quite often. It looks like pausing the blob creation trigger removes the event grid configuration, re-activating the trigger adds the event-grid back to the storage-account but without sftpcreate or sftpcommit. 

    Pausing and re-activating triggers is part of our devops release as well as manually when we pause processing. 

     

    Is there a way to automate adding the 2 APIs to the even-grid? 

    Are there any plans to have the 2 SFTP APIs added to the event-grid when the DataFactory trigger is activated?

  • NCJ's avatar
    NCJ
    Copper Contributor

    The screenshots you've shown indicate a GUI to modify the event filters; I find this GUI in my Event Grid System Topic subscriptions, but ADF is not registering a subscriber when creating the file triggers, and so there's nothing to modify.  These instructions are unclear; what is being done to ensure the ADF storage event triggers are registering within the Event Grid System Topics? 

     

     

    Update:  We learned that when crossing subscriptions, you must give the ADF Managed Identity the Storage Account Permissions, but you must also give the Azure Data Factory Application, which seems to be only one-per-subscription (even when multiple Data Factories) the correct EventGrid Contributor permissions for the ADF Trigger to successfully register the listener on the System Topic of the Storage Account hosting your SFTP front end.  Once this permission was granted to our ADF Application, publishing our triggers resulted in the correct registration of System Topic Listener, where we could then go and add the SFTPCommit event.

  • Hi saiprkAmitAtre  - I believe this might not be a client behavior of how uploading is being performed and something outside or SFTP Storage or ADF.  Are you making use of WinSCP to perform the upload? If yes, it could be coming due to enablement of endurance setting of resume/transfer. 

  • AmitAtre's avatar
    AmitAtre
    Copper Contributor

    saiprk were you able to resolve the issue for FILENAME.csv.filepart ? I was expecting the full file name, just having a partfile suggests that data is getting copied in chunks. I need all data as part of a single file.

  • Hi Amrinder_Singh , I tried the above approach.

    @triggerBody().fileName works fine for smaller files like 20kb. But when file size exceeds 100 kb @triggerBody().fileName returns FILENAME.csv.filepart.

    Is this expected ? 

     

  • rickdana's avatar
    rickdana
    Copper Contributor

    Hi Amrinder_Singh , 

     

    Thank for the input. You literally saved me a lot of time. I was blob by this blob trigger not starting when uploading a file over SFTP. I tested you solution and now it works. 

    Great job :stareyes: