Forum Discussion
Need ADF pipeline suggestion
I have an ADF pipeline that copies Files from source to destination. Both Source and destinations are different folders within adls only. My pipeline design is as follows
1.) Lookup activity- A sql server Stored procedure that returns sourcepath and the destination path. This is connected to a Foreachloop
2.) Foreachloop activity - Has 10 as the batchcount. Within this activity I have Copydata activity
3.) Copydata activity - I have the source and sink paths set from the storedprocedure output columns. Source and destination Location is ADLS gen2.
It works fine but I have about 1 millions files that the stored procedure returns and it takes about 20 mins to complete 1000 rows/files to copy. What settings/config can I change to make this run faster?
1 Reply
- petevernBrass Contributor
If the batch count controls parallel execution, have you tried increasing it to 30? I believe ADF allows up to 50 concurrent executions.
Additionally, if there is a pattern in the file names, you could split them into chunks and run multiple pipelines simultaneously instead of relying solely on batch count for parallelism.
It's unclear whether the 1 million files are new each time or not. If they are not, consider copying only the new or updated files based on the modified date