Forum Discussion
MatthiasVC
Jun 16, 2023Copper Contributor
PowerQuery Table Buffer at end of query
Hi, I've got a PowerQuery File from a colleague that goes very slow (even though it's only 3.000 rows and 10 columns) and I'm intent on making it run smoother.
One thing that I noticed is that there are many queries dependencies which I presume result in a lot of double calculations. (time-intensive ones such as merge and sort).
Just to test things out I wanted to add a buffer at the end of some Query steps but I noticed it didn't really help much:
e.g.
My test
I have following Result Tables: T1 (30s loading time), T2 (utilizes T1 and has 2m20 loading time when refreshing concurrent with T1)
I added a third Table which does nothing more than say "=T2"
now if I refresh them all concurrently they take 30s,2m20s and 3m40s respectively.
If I drop the loading of T2 and only load T1 and T3 it takes 30s and 2m20.
What I understood
This confirms both that the step "=T2" really doesn't take any time at all (2m20s vs 2m20s) and that all queries are loaded independently.
What I don't understand
However, what I didn't mention is that I added a Table.Buffer as the Last step of T2.
Adding this step should result T3 being able to be loaded straight from that buffer instead of requiring extra calculation and thus it SHOULD take 30s, 2m20s 2m20s. Why doesn't it? Are the results of Table.Buffer really only available within the query itself?
If T1=>T2, T1=>T3, T2=>T4 and T3=> T4 where T1,T2,T3 are connection only and T4 is loaded to the workbook. I understand that T1 is calculated twice (once for calculating T2 and once for calculating T3), but does it still need to be calculated twice if you put a Table.Buffer at the end of it?
Please check Why Does Power BI Query My Data Source More Than Once? - Chris Webb's BI Blog (crossjoin.co.uk), second part of the post is about your case. Also Referenced Queries And Caching In Power BI And Power Query - Chris Webb's BI Blog (crossjoin.co.uk)
Sorry, but I can't explain better than Chris.
Also he has series of post regarding Power Query performance, perhaps something will be useful. First what Google returned Power Query/M Optimisation: Getting The Maximum Value From A Column, Part 2 - Chris Webb's BI Blog (crossjoin.co.uk)
4 Replies
Sort By
Power Query doesn't keep resulting query tables. In general if you have referenced queries each of them evaluating from scratch, from first query in line to resulting one. Cache helps in evaluation, but depends how your lines of references are built.
Table.Buffer keeps resulting query table in memory which makes such operations as sorting and removing duplicates more reliable. In 90% of cases.
Equivalent of Table.Buffer is to add Index column to table, perform operations as above and remove such Index column.
- MatthiasVCCopper ContributorOkay Thanks, That's already part of my question.
Then I guess the next part is: does Table.Buffer work if you're referring to a certain step multiple times? Or is that just exactly the same as referencing another query?
As in If you have following setup (and no query folding as it's all excel based sources):
Step 1) GetDataFromSource1 (takes a couple of seconds)
Step 2) GetDataFromSource2 (takes a couple of seconds)
Step 3) Combination of Step 1 and Step 2
Step 4) Combination of Step 1 and Step 2
Step 5) Combination of Step 3 and Step 4
In this setup Step 1 and Step 2 are performed both twice to get Step 5. Can this be circumvented by adding a Table.Buffer at step 1 and step 2?
Based on what you wrote I now believe that Table.Buffer is not at all what I thought it was. I thought it was a temporary save of the Table, (which it kind of is) but it's only within a single step to improve the speed of some very specific functions, not when other queries (or query steps) reference to the steps. Correct?Please check Why Does Power BI Query My Data Source More Than Once? - Chris Webb's BI Blog (crossjoin.co.uk), second part of the post is about your case. Also Referenced Queries And Caching In Power BI And Power Query - Chris Webb's BI Blog (crossjoin.co.uk)
Sorry, but I can't explain better than Chris.
Also he has series of post regarding Power Query performance, perhaps something will be useful. First what Google returned Power Query/M Optimisation: Getting The Maximum Value From A Column, Part 2 - Chris Webb's BI Blog (crossjoin.co.uk)