AND logic - Many Pre-processing Jobs - Part 1

Former Employee

Mar 29, 2022

Quite often, an ETL pipeline have multiple upstream sources: you need to copy a handful of data streams into a central place, before kicking off next stage processing. You want to express these dependencies: an activity should wait for all its predecessors to finish before starting.

There are two ways to express the logic: (1) Inline and (2) ExecutePipeline, each with its own strength and shortcomings. Specifically, ExecutePipeline is the preferred way, if you also want to introduce error handling, with a common error handling job, to the logic. We will discuss about error handling in Part 2 of the series.

Express Multi-dependencies Inline

ADF pipelines naturally support logical and conditions in pipelines: you can connect many activities to an activity to express upstream dependencies. For instance, in this sample pipeline, Upstream1, Upstream2, and Upstream3 will kick off in parallel, and PostProcess will block until all upstream activities succeed. If any of the 3 upstream activities fail, the pipeline will fail, and PostProcess will never execute

The upsides of this approach are:

simplicity: you can specify dependencies with a couple of arrows, and ADF will enforce the dependencies for you
individual error handler: each upstream job may require different error handling logic. This approach allows use define individual error handling paths

When using error handling paths with inline approach, please beware of the limits on maximum activities per pipeline.

Express Multi-dependencies with Execute Pipeline

Alternatively, you may want to port all upstream in a separate pipeline, and use ExecutePipeline to stich them together.

In the Upstream pipeline, Upstream1, Upstream2, and Upstream3 will kick off in parallel. The pipeline will succeed if all upstream activities succeed. If any of the 3 upstream activities fail, the pipeline will fail. You may use Fail Activity to surface detailed error messages from the upstream pipelines.

In the main pipeline, please ensure Wait on Pipeline is selected. PostProcess will block until upstream pipeline succeed.

The upsides of this approach are:

modularity: you can break the dependency graph into two parts and modify them separately (for instance add another data source without touch post processing steps)
common error handler: if a shared error handling path is preferred, a UponFailure path can be added to ExecutePipeline activity for one for all error catching. Please utilize the error payload from ExecutePipeline for error logging. We will discuss this in details in Part 2

Updated Mar 29, 2022

Version 2.0

azure data factory

Azure ETL

ChenyeCharlieZhu

Former Employee

Joined September 27, 2019

View Profile

Azure Data Factory Blog

Follow this blog board to get notified when there's new activity