Forum Discussion
Cindysc1218
May 27, 2022Copper Contributor
Excel and Power Query: PQ is taking hours to execute
Hi, everyone:
I am new to Power Query and not sure why it is taking hours to execute the code below. Because the data is sensitive, I am unable to share the file. However, I can say that the file contains 300K+ records, and the following are the PQ queries I am executing (note: variable names have been modified):
= Table.NestedJoin(#"Transactions ", {"Account No."}, #"Orders", {"Acct"}, "Orders", JoinKind.LeftOuter)
= Table.AddColumn(#"Transactions", "Merge by Date Range", each Table.SelectRows(#"Orders", (x) => x[Ship Date] >= [#"Post Date - 30"] and x[Ship Date] <= [#"Post Date + 30"] and x[#"Acct"] = [#"Account No."]))
= Table.ExpandTableColumn(#"Added Custom", "Merge by Date Range", {"Acct", "Exp Type Name", "Product Description", "Units", "Retail", "Ship Date"}, {"Acct", "Exp Type Name", "Product Description", "Units", "Retail", "Ship Date"})
= Table.Group(#"Expanded Merge by Date Range", {"Acct", "Ship Date"}, {{"AllData", each _, type table [#"Post Date - 30"=nullable date, #”Post Date”=nullable date, #"Post Date + 30"=nullable date, #" Account No."=nullable text, Type=nullable text, Amount=nullable number, Acct=nullable text, Exp Type Name=nullable text, Product Description=nullable text, Units=nullable number, Retail=nullable number, Ship Date=nullable date]}, {"Sum of Retail", each List.Sum([Retail]), type nullable number}})
= Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"Post Date - 30", "Post Date", "Post Date + 30", "Account No.", "Type", "Amount", "Acct", "Exp Type Name", "Product Description", "Units", "Retail", "Ship Date"}, {"Post Date - 30", "Post Date", "Post Date + 30", " Account No.", "Account Type", "Amount", "Acct.1", "Exp Type Name", "Product Description", "Units", "Retail", "Ship Date.1"})
= Table.RemoveColumns(#"Expanded AllData",{"Acct", "Ship Date"})
= Table.RenameColumns(#"Removed Columns",{{"Acct.1", "Account No."}, {"Ship Date.1", "Ship Date"}})
= Table.Sort(#"Renamed Columns",{{"Account No.", Order.Ascending}, {"Post Date", Order.Ascending}, {"Ship Date", Order.Ascending}, {"Retail", Order.Descending}})
I have since learned about Table.Buffer, so I applied it to every statement. But it has not helped. What else do I need to do to make this quicker?
Thank you,
Cindy
- Riny_van_EekelenPlatinum Contributor
Cindysc1218 It may be difficult to judge, based on the code alone, but it seems that in the second step you add a column with filtered content. If I'm not mistaken, that creates a column with 300K identical tables of filtered records, that you then Expand and Group. Could you not just use two filter steps (i.e. Table.SelectRows) and end with something like this?
Filter1 = Table.SelectRows(#"Orders", each [Ship Date] >= [#"Post Date - 30"] and [Ship Date] <= [#"Post Date + 30"]),
Filter2 = Table.SelectRows(Filter1, each [Acct] = [#"Account No."])
Not being able to test any of this, I may be missing the point altogether.
- Cindysc1218Copper ContributorSo that second step was something I had to research to find out how to do something I need.
What I need to do is merge (left join) two tables using account and the dates as my primary keys. I understand I do an exact match on accounts to merge the tables, but I cannot do an exact match on dates which is why I am attempting to match when Ship Date is either within 30 days before or 30 days after the Post Date. After spending much time online, the code in step 2 appears to do what I need to do. Are you saying that your solution can do that the same thing?- Riny_van_EekelenPlatinum Contributor
Cindysc1218 I'm not suggesting anything, but merely notice that you seem to add a column with identical tables containing filtered records to each of the 300K rows. And I'm just wondering why you don't filter the date column first and then the account number. But as said that's difficult to judge without seeing the data. What stops you from trying on a copy of your file. If it doesn't work, I clearly missed the point.
Or perhaps you could just add a column stating:
= [Ship Date] >= [#"Post Date - 30"] and x[Ship Date] <= [#"Post Date + 30"] and x[#"Acct"] = [#"Account No."]
This should result in TRUE or FALSE. After that, filter TRUE to keep matching rows.