Excel and Power Query: PQ is taking hours to execute

Copper Contributor

Hi, everyone:

 

I am new to Power Query and not sure why it is taking hours to execute the code below.  Because the data is sensitive, I am unable to share the file.  However, I can say that the file contains 300K+ records, and the following are the PQ queries I am executing (note: variable names have been modified):

 

= Table.NestedJoin(#"Transactions ", {"Account No."}, #"Orders", {"Acct"}, "Orders", JoinKind.LeftOuter)
 
= Table.AddColumn(#"Transactions", "Merge by Date Range", each Table.SelectRows(#"Orders", (x) => x[Ship Date] >= [#"Post Date - 30"] and x[Ship Date] <= [#"Post Date + 30"] and x[#"Acct"] = [#"Account No."]))
 
= Table.ExpandTableColumn(#"Added Custom", "Merge by Date Range", {"Acct", "Exp Type Name", "Product Description", "Units", "Retail", "Ship Date"}, {"Acct", "Exp Type Name", "Product Description", "Units", "Retail", "Ship Date"})
 
= Table.Group(#"Expanded Merge by Date Range", {"Acct", "Ship Date"}, {{"AllData", each _, type table [#"Post Date - 30"=nullable date, #”Post Date”=nullable date, #"Post Date + 30"=nullable date, #" Account No."=nullable text, Type=nullable text, Amount=nullable number, Acct=nullable text, Exp Type Name=nullable text, Product Description=nullable text, Units=nullable number, Retail=nullable number, Ship Date=nullable date]}, {"Sum of Retail", each List.Sum([Retail]), type nullable number}})
 
= Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Post Date - 30", "Post Date", "Post Date + 30", "Account No.", "Type", "Amount", "Acct", "Exp Type Name", "Product Description", "Units", "Retail", "Ship Date"}, {"Post Date - 30", "Post Date", "Post Date + 30", " Account No.", "Account Type", "Amount", "Acct.1", "Exp Type Name", "Product Description", "Units", "Retail", "Ship Date.1"})
 
= Table.RemoveColumns(#"Expanded AllData",{"Acct", "Ship Date"})
 
= Table.RenameColumns(#"Removed Columns",{{"Acct.1", "Account No."}, {"Ship Date.1", "Ship Date"}})
 
= Table.Sort(#"Renamed Columns",{{"Account No.", Order.Ascending}, {"Post Date", Order.Ascending}, {"Ship Date", Order.Ascending}, {"Retail", Order.Descending}})
 
I have since learned about Table.Buffer, so I applied it to every statement.  But it has not helped.  What else do I need to do to make this quicker?
 
Thank you,
Cindy
7 Replies

@Cindysc1218 It may be difficult to judge, based on the code alone, but it seems that in the second step you add a column with filtered content. If I'm not mistaken, that creates a column with 300K identical tables of filtered records, that you then Expand and Group. Could you not just use two filter steps (i.e. Table.SelectRows) and end with something like this?

 

Filter1 = Table.SelectRows(#"Orders", each [Ship Date] >= [#"Post Date - 30"] and [Ship Date] <= [#"Post Date + 30"]),

Filter2 = Table.SelectRows(Filter1, each [Acct] = [#"Account No."])

 

Not being able to test any of this, I may be missing the point altogether.

So that second step was something I had to research to find out how to do something I need.

What I need to do is merge (left join) two tables using account and the dates as my primary keys. I understand I do an exact match on accounts to merge the tables, but I cannot do an exact match on dates which is why I am attempting to match when Ship Date is either within 30 days before or 30 days after the Post Date. After spending much time online, the code in step 2 appears to do what I need to do. Are you saying that your solution can do that the same thing?
This is also a one-to-many join.

@Cindysc1218 I'm not suggesting anything, but merely notice that you seem to add a column with identical tables containing filtered records to each of the 300K rows. And I'm just wondering why you don't filter the date column first and then the account number. But as said that's difficult to judge without seeing the data. What stops you from trying on a copy of your file. If it doesn't work, I clearly missed the point.

 

Or perhaps you could just add a column stating: 

= [Ship Date] >= [#"Post Date - 30"] and x[Ship Date] <= [#"Post Date + 30"] and x[#"Acct"] = [#"Account No."]

 

This should result in TRUE or FALSE. After that, filter TRUE to keep matching rows.

I hope I understand you correctly.

For your first suggestion, I filtered dates in regular Excel before using PQ. For your second suggestion, doesn't that mean every row in the first table has to join with every row in the second table in order to find out if the conditional is true or not?

@Cindysc1218 Can't answer as I'm having a hard time visualizing your data. Sorry.

No worries. Thank you for the suggestions anyway.