SOLVED

Power Query Rows Loaded Count

Copper Contributor

Good day.

 

I have filtered queries that merge into a single query. The end result in Excel is about 20,000 rows. When refreshing, it counts up to 3.6 million rows.

 

This is problematic for speed - but I cannot figure out where to determine the problem spot. I say "problem" because the query loads about 20,000 rows per second until about 3.1 million rows...then slows to about 100 rows per second until about 3.4 million when the speed returns to about 20,000 per second.

 

Questions:

- When refreshing a query (or in the preview mode in the Power Query Editor), what is the "Rows Loaded" actually counting?     [[If I have filtered rows, why does it count so many more?]]

- What would explain the variable speeds in refreshing?

- How can I trouble shoot to find what is slowing refresh?

- Should I change how I think about this in any way?

 

Thank you for any help.

3 Replies
best response confirmed by PowerTripQuery (Copper Contributor)
Solution

@PowerTripQuery 

Power Query doesn't keep resulting tables in memory, it re-evaluates each time it is called.

 

Let assume you have

Query1 with 1M rows

Query2 with 1M rows

Query3 which merge Query1 and Query2 and gives another 1M rows

Q4 (query or step, doesn't matter) which filter Query3 down to 100 rows

 

When you call Q4 it evaluates Q1, when Q2, when Q3 which means you are loading on this step at least 3M rows.

 

How to minimize, if possible, very depends on concrete queries and steps.

My example, and to check my understanding:

All merges are "Left".
Q1 - 197K rows from database. Columns removed. Filtered to 18K rows.
Merged with...
Q2 - 285K from database. Columns removed, BUFFERED, grouped to 197K rows.
Q3 - 75K from database. Columns removed.
Q4 - 2.9M from database. Filtered, columns removed, BUFFERED. 65K remain.
Q5 - 45K from database. Columns removed, filtered to 42K rows, BUFFERED.
Q6 - complicated. Made of two queries slimmed similarly to the others above, then grouped to only 450 rows. Database lines are 167K and 53K.

What is being counted by the "rows loaded" is the numbers of rows from the database. If Q1 utilizes the previous queries, they will still go to the database and record the row count from that initial call to the database - not the filtered amounts from the sub queries. Thus, my "rows loaded" will be 197K+285K+2.9K+45K+167K+53K= 3.647M rows. Filtering then takes place, resulting in the shown final row count in the spreadsheet.

Interestingly enough as I checked the row counts, it appears that the database row count sum from Q1-4 are in the "fast" counting period, and Q4 may be where I need to try to change something to improve speed.

Does my re-stating of what you said with my row counts make sense?

@PowerTripQuery 

Something like this. In some cases Table.Buffer could help to reuse the cache, but not necessary. In majority of cases cache will be re-evaluated.

If you take data from the database, especially SQL database, pay attention to query folding. That allows to do all calculations on the server, which is much faster.

If query folding in general works in your setup, check steps which breaks query folding and, where possible, try to modify the query to avoid such steps.

1 best response

Accepted Solutions
best response confirmed by PowerTripQuery (Copper Contributor)
Solution

@PowerTripQuery 

Power Query doesn't keep resulting tables in memory, it re-evaluates each time it is called.

 

Let assume you have

Query1 with 1M rows

Query2 with 1M rows

Query3 which merge Query1 and Query2 and gives another 1M rows

Q4 (query or step, doesn't matter) which filter Query3 down to 100 rows

 

When you call Q4 it evaluates Q1, when Q2, when Q3 which means you are loading on this step at least 3M rows.

 

How to minimize, if possible, very depends on concrete queries and steps.

View solution in original post