Quite often, an ETL pipeline have multiple upstream sources: you need to copy a handful of data streams into a central place, before kicking off next stage processing. You want to express these dependencies: an activity should wait for all its predecessors to finish before starting.
There are two ways to express the logic: (1) Inline and (2) ExecutePipeline, each with its own strength and shortcomings. Specifically, ExecutePipeline is the preferred way, if you also want to introduce error handling, with a common error handling job, to the logic. We will discuss about error handling in Part 2 of the series.
ADF pipelines naturally support logical and conditions in pipelines: you can connect many activities to an activity to express upstream dependencies. For instance, in this sample pipeline, Upstream1, Upstream2, and Upstream3 will kick off in parallel, and PostProcess will block until all upstream activities succeed. If any of the 3 upstream activities fail, the pipeline will fail, and PostProcess will never execute
The upsides of this approach are:
When using error handling paths with inline approach, please beware of the limits on maximum activities per pipeline.
Alternatively, you may want to port all upstream in a separate pipeline, and use ExecutePipeline to stich them together.
In the Upstream pipeline, Upstream1, Upstream2, and Upstream3 will kick off in parallel. The pipeline will succeed if all upstream activities succeed. If any of the 3 upstream activities fail, the pipeline will fail. You may use Fail Activity to surface detailed error messages from the upstream pipelines.
In the main pipeline, please ensure Wait on Pipeline is selected. PostProcess will block until upstream pipeline succeed.
The upsides of this approach are:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.