Jul 01 2020 12:48 AM - edited Jul 01 2020 12:49 AM
Hi guys!
Before I start with my question, please, take a look the pictures that I attached.
My source is folder X which contains 3 files (but also can contain x files and problem is still there). What I do: I import data in Power Query, delete column with file name, sort them by date from newest to oldest. Then I want to know what is last status of item that is bar coded. When I remove duplicates (bar Code column) I only get status from the 1st file, but I want to get status with newest date whatever is position of file in folder.
What is the trick?
Jul 01 2020 12:28 PM
SolutionRemove Duplicates keeps first of duplicated rows in order of how you sorted records before. However, to ensure it works correctly before removing duplicates you need to fix the table in memory. You may wrap it by Table.Buffer, or easier to add Index column->select column(s) for which remove duplicates->remove them->remove Index column.
Jul 02 2020 12:40 AM
@Sergei Baklan Thank You again for Your help. I used Table.Buffer and it works perfectly now. After You advised me to use this, I wasn't sure how to use it so I found it on https://exceleratorbi.com.au/remove-duplicates-keep-last-record-power-query/
Jul 02 2020 01:30 PM
Sorry didn't explain Table.Buffer() in more details. But again, simplest way is to Add index->Sort->Remove index
Anyway, glad to know you sorted the issue out.
Apr 02 2021 04:01 AM
Thank you for the tip! I used buffer previously, but with lots of flat files and millions of rows refresh of a dataflow became rather slow. Below my initial timings with Table.Buffer / Sort / RemoveDuplicates.
Bytes processed (KB) | Max commit (KB) | Processor Time |
273836 | 2134400 | 30:47.9 |
As the production data set will be four times bigger and growing I got upset. But thanks to your advice same results were achieved faster and with less effort on the capacity side:
Bytes processed (KB) | Max commit (KB) | Processor Time |
211547 | 1488424 | 08:57.5 |
Apr 02 2021 04:19 AM
Unfortunately not, optimising of performance in significant part is the art, not only technology, it individual for each concrete case. It's always better to follow Best practices when working with Power Query | Microsoft Docs even they are not directly affect the performance. As a minimum check if query folding works, you have modular structure of the queries and you minimised repeated refresh of them.
Jul 01 2020 12:28 PM
SolutionRemove Duplicates keeps first of duplicated rows in order of how you sorted records before. However, to ensure it works correctly before removing duplicates you need to fix the table in memory. You may wrap it by Table.Buffer, or easier to add Index column->select column(s) for which remove duplicates->remove them->remove Index column.