Blog Post
Smart Pipelines Orchestration: Designing Predictable Data Platforms on Shared Spark
Great thought!
Just wondering, how can we extend this in use cases where different jobs are triggered from parallel pipeline executions and have no common parent pipeline?
Hello Paarath Gupta,
Yes, this can be made fully dynamic using metadata-driven pipelines. In this approach, you would define three parent pipelines based on workload type: Light, Medium, and Heavy.
Each parent pipeline reads from metadata and uses a ForEach activity to control parallelism. For example, the Light parent pipeline can trigger ~20 light workloads in parallel, followed by a smaller number of Medium pipelines (e.g., 2), and finally the Heavy pipelines.
The master pipeline’s responsibility is only to orchestrate execution order by invoking these three parent pipelines. Each parent pipeline, in turn, is responsible for invoking its corresponding child pipelines based on metadata.
Metadata can be saved as json and controlled by AI agent that can determine the weight for each pipeline.