Forum Discussion

grames's avatar
grames
Copper Contributor
Jun 03, 2026

ADF - REST API Copy Data Activity - Best Practices

Hey everyone,

I'm relatively new to Azure and am using Azure Data Factory (ADF) to extract data from Vonigo via their API.

Everything is working, but I'm looking for ways to improve efficiency. Right now it takes 30+ minutes to pull all franchise data for a given report.

The challenge is that Vonigo only allows reports to be run for one franchise at a time, and switching franchises requires a separate API call. My current process is:

  1. Select a franchise from a list
  2. Run all required reports for that franchise
  3. Save the results to Blob Storage
  4. Move to the next franchise and repeat

(We'll eventually ingest the files into our silver layer for transformation.)

Speed Issue

One thing that significantly slows the pipeline down is that many activities are running sequentially. If I disable sequential execution, I've seen cases where data gets written to the wrong destination or associated with the wrong franchise.

Has anyone successfully parallelized a similar process while maintaining data integrity? Are there specific points in the workflow where parallel execution would be safe?

Pagination / Loop Issue

Originally, I used a Lookup activity to inspect the most recently created file. An If activity would then determine whether the file contained any records:

  • If records existed, increment the page number and continue.
  • If no records existed, end the loop.

This worked, but the Lookup activity added noticeable overhead.

To improve performance, I changed the logic to use the Copy Activity output instead. Specifically, I'm checking the amount of data read from the last API call. Pages with no records appear to consistently return the same data-read value, so I use that to determine when to stop paging.

This approach is much faster, but it feels more fragile since it's relying on an indirect indicator rather than the actual record count.

Would you trust the Copy Activity output in this scenario, stick with the Lookup approach, or recommend a different pattern altogether?

Thanks for any suggestions.

3 Replies

  • yuscustomermike's avatar
    yuscustomermike
    Copper Contributor

    Second, regarding pagination and loop control. Moving away from Lookup was a good decision from a performance perspective. However, using data read size as a condition is not very robust.

  • yuscustomermike's avatar
    yuscustomermike
    Copper Contributor

    Frustrated with automated bots? Learn how to navigate help.microsoft.com, bypass virtual menus, and connect with a live Microsoft person instantly.

  • aziz-saiji's avatar
    aziz-saiji
    Copper Contributor

    Hi,

    You are already on the right track, and your observations are valid. There are two main areas to improve here: safe parallelism and pagination strategy.

    First, regarding parallelism and data integrity. The issue you are seeing when disabling sequential execution is very likely caused by shared state such as variables or outputs being reused across iterations. In ADF, parallel execution can lead to data mix-ups if global variables are used.

    A more reliable approach is to use a ForEach activity with parallel execution enabled, where each iteration processes one franchise independently. Make sure you do not use shared variables inside the loop, and instead rely on item() and pipeline parameters. Also, ensure that your sink path is dynamically generated and unique per franchise and per run.

    For example, write files to a path like: franchise_{franchiseId}/report_{runId}.json

    This guarantees isolation and avoids overwriting data. It is also recommended to keep pagination sequential within each franchise, while running multiple franchises in parallel. This gives better performance without compromising data integrity.

    Second, regarding pagination and loop control. Moving away from Lookup was a good decision from a performance perspective. However, using data read size as a condition is not very robust.

    A better approach is to use Copy Activity output metadata, especially rowsCopied. You can stop your loop when rowsCopied equals zero, which is more reliable than checking the data size.

    If the API supports it, an even better option is to rely on response metadata such as a hasMore flag or a next page or token value. This is more stable than inferring behavior from indirect indicators.