ADF has added a TTL (time-to-live) option to the Azure Integration Runtime for Data Flow properties to reduce data flow activity times.
This setting is only used during ADF pipeline executions of Data Flow activities. Debug executions from pipelines and data preview debugging will continue to use the debug settings which has a preset TTL of 60 minutes.
If you leave the TTL to 0, ADF will always spawn a new Spark cluster environment for every Data Flow activity that executes. This means that an Azure Databricks cluster is provisioned each time and takes about 5-7 minutes to become available and execute your job.
However, if you set a TTL, ADF will maintain a pool of VMs which can be utilized to spin-up each subsequent data flow activity against that same Azure IR. This reduces the amount of time needed to start-up the environment before your job is executed.
ADF will maintain that pool for the TTL time after the last data flow pipeline activity executes. Note that this will extend your billing period for a data flow to the extended time of your TTL. However, your data flow job execution time will decrease because of the re-use of the VMs from the compute pool. The compute resources are not provisioned until your first data flow activity is executed using that Azure IR.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.