Forum Discussion
How to handle azure data factory lookup activity with more than 5000 records
DynamicsHulk you can create a new pipeline with two integer variables, iterations, and count, initialized to 0.
First, determine the necessary number of iterations. You can achieve this by looking at the total number of datasets.
In your query, divide the total count by 5000, add one, and round it upwards. Set the value of the iterations variable to this calculated value using the Set Variable activity.
Within the pipeline, use an Until loop. The loop condition should be @equals(variables('count'), variables('iterations')).
Inside the loop, perform a lookup using the count variable as your offset. If fetching data from a Dataverse table, consider using the ORDER BY clause to ensure consistent results. Increment the count variable by one after each iteration. This approach lets you retrieve data in smaller chunks, avoiding the 5000-row limitation.
Another option is using batch processing with ForEach Activity in combination with custom batch-processing logic. Process the data in smaller batches instead of fetching all records at once.
For example, to fetch 100 rows at a time, multiply the batch size (100) by the count variable (or a parameter) to determine the offset. Implement your logic within the ForEach loop to handle each batch of data.
This approach effectively overcomes the limitations of the Lookup activity by breaking down the data into manageable chunks.
You can read more about https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities?tabs=data-factory&WT.mc_id=%3Fwt.mc_id%3Dstudentamb_357518
See All about BI youtube video on https://www.youtube.com/watch?v=Y4yFXGbhYZI
Note:
If this post is helpful, please give my response a thumbs up! You can also mark it as the solution to help others find it easily.
Thanks