Forum Discussion

CzarR's avatar
CzarR
Copper Contributor
Dec 18, 2024

Need ADF pipeline suggestion

I have an ADF pipeline that copies Files from source to destination. Both Source and destinations are different folders within adls only. My pipeline design is as follows

 

1.) Lookup activity-  A sql server Stored procedure that returns sourcepath and the destination path. This is connected to a Foreachloop

2.) Foreachloop activity - Has 10 as the batchcount. Within this activity I have Copydata activity

3.) Copydata activity - I have the source and sink paths set from the storedprocedure output columns. Source and destination Location is ADLS gen2.

 

It works fine but I have about 1 millions files that the stored procedure returns and it takes about 20 mins to complete 1000 rows/files to copy. What settings/config can I change to make this run faster?

1 Reply

  • petevern's avatar
    petevern
    Brass Contributor

    If the batch count controls parallel execution, have you tried increasing it to 30? I believe ADF allows up to 50 concurrent executions.

    Additionally, if there is a pattern in the file names, you could split them into chunks and run multiple pipelines simultaneously instead of relying solely on batch count for parallelism.

    It's unclear whether the 1 million files are new each time or not. If they are not, consider copying only the new or updated files based on the modified date