Storage Event Trigger in Azure Data Factory is the building block to build an event-driven ETL/ELT architecture (EDA). Data Factory's native integration with Azure Event Grid let you trigger processing pipeline based upon certain events. Currently, Storage Event Triggers support events with Azure Data Lake Storage Gen2 and General Purpose version 2 storage accounts, including Blob Created and Blob Deleted.
Event-driven architecture (EDA) is a common data integration pattern that involves production, detection, consumption, and reaction to events. Data integration scenarios often require customers to trigger pipelines based on events happening in storage account, such as the arrival or deletion of a file in Azure Blob Storage account. Data Factory and Synapse pipelines natively integrate with Azure Event Grid, which lets you trigger pipelines on such events.
This blog demonstrates how we can use ADF triggers for running the ADF pipeline in events of Azure Storage events.
Create ADF resource on Azure Portal. If you are new to ADF, please refer this link on how to create one:
Create an Azure data factory using the Azure Data Factory UI - Azure Data Factory | Microsoft Docs
Once Data Factory is created, navigate to Azure Data Factory Studio present in the Overview section:
As we land on the ADF portal, Create Linked service for storage account in ADF Portal as per the below screenshots:
Once you click on ‘+New’, we need to first select the data source. If you are using GPv2 Blob Storage account use ‘Azure Blob Storage’ and if you are working with ADLS Gen2 account use ‘Azure Data Lake Storage Gen2’. I’ve used Gen2 in this demo.
After selecting the Data Store fill in required details as below:
Once the Test Connection is successful, click on ‘Create. This will create the storage account linked service.
Creating Input and Output Datasets
In this demo, we will create a simple ADF pipeline that will copy an ‘emp.txt’ file from one folder ‘input’ to another folder ‘output’ within a container. Hence, we need input and output datasets in ADF that maps to the blobs in input and output folder. So let’s create InputDataset and OutputDataset:
Go to ‘Author’ on ADF portal and click on ‘New Dataset’ as per below screenshot:
Then click on ‘Ok’.
Similarly, you can create OutputDataset as below:
Create the ADF pipeline to copy data from ‘input’ to ‘output’ folder as per the below screenshots:
Give a pipeline name and drag ‘Copy Data’ activity to the designer surface. Name the activity:
Select Source and Sink as below:
Now ‘Validate’ the pipeline and ‘Debug’ to check whether it works as expected.
Once the pipeline is validated, let’s Create BlobCreated event Trigger as per below screenshot:
Choose Trigger--> New:
After clicking on ‘Continue’, you will get ‘Data Preview’. This shows the blobs that matches the event trigger filters thus you can verify whether the filter is correct or not. Click ‘Continue’ and you will see ‘Parameters’ section. This is helpful when you want to pass any parameters to the pipeline. Skip this as we are not using parameters in this demo and click ‘Ok’.
Now we have all the components in place and next step would be to ‘Publish’ all the changes.
Once publish is completed, let’s test the trigger.
Upload file ‘emp.txt’ to input folder and this should fire the BlobCreated event thus firing ADF trigger.
File copied to output folder:
ADF Trigger run:
As we see from the result screenshots above, the BlobCreated trigger works as expected and runs the attached ADF pipeline.
Similarly, BlobDeleted event can be created.
Hope this helps!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.