How to handle azure data factory lookup activity with more than 5000 records

Copper Contributor

Hello Experts,

 

The DataFlow Activity successfully copies data from an Azure Blob Storage .csv file to Dataverse Table Storage. However, an error occurs when performing a Lookup on the Dataverse due to excessive data. This issue is in line with the documentation, which states that the Lookup activity has a limit of 5,000 rows and a maximum size of 4 MB.

 

Also, there is a Workaround mentioned (Micrsofot Documentation): Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size.

 

How can I do this? Is there a way to define an offset (e.g. only read 1000 rows)

 

Thanks,

-Sri

1 Reply

@DynamicsHulk you can create a new pipeline with two integer variables, iterations, and count, initialized to 0.

 

First, determine the necessary number of iterations. You can achieve this by looking at the total number of datasets.

In your query, divide the total count by 5000, add one, and round it upwards. Set the value of the iterations variable to this calculated value using the Set Variable activity.

Within the pipeline, use an Until loop. The loop condition should be @equals(variables('count'), variables('iterations')).

 

Inside the loop, perform a lookup using the count variable as your offset. If fetching data from a Dataverse table, consider using the ORDER BY clause to ensure consistent results. Increment the count variable by one after each iteration. This approach lets you retrieve data in smaller chunks, avoiding the 5000-row limitation.

 

Another option is using batch processing with ForEach Activity in combination with custom batch-processing logic. Process the data in smaller batches instead of fetching all records at once.

 

For example, to fetch 100 rows at a time, multiply the batch size (100) by the count variable (or a parameter) to determine the offset. Implement your logic within the ForEach loop to handle each batch of data.

 

This approach effectively overcomes the limitations of the Lookup activity by breaking down the data into manageable chunks.

 

You can read more about Pipelines and activities in Azure Data Factory and Azure Synapse Analytics

See All about BI youtube video on Handle Lookup with more than 5000 records using Loop

 

Note:

If this post is helpful, please give my response a thumbs up! You can also mark it as the solution to help others find it easily.

 

Thanks