Forum Discussion

Zarko_Tripunovic

Copper Contributor

Jul 01, 2020

Solved

Removing duplicates in PowerQuery

Hi guys! Before I start with my question, please, take a look the pictures that I attached. My source is folder X which contains 3 files (but also can contain x files and problem is still there). ...

SergeiBaklan
Jul 01, 2020
Zarko_Tripunovic

Remove Duplicates keeps first of duplicated rows in order of how you sorted records before. However, to ensure it works correctly before removing duplicates you need to fix the table in memory. You may wrap it by Table.Buffer, or easier to add Index column->select column(s) for which remove duplicates->remove them->remove Index column.

Diamond Contributor

Jul 01, 2020

Zarko_Tripunovic

Remove Duplicates keeps first of duplicated rows in order of how you sorted records before. However, to ensure it works correctly before removing duplicates you need to fix the table in memory. You may wrap it by Table.Buffer, or easier to add Index column->select column(s) for which remove duplicates->remove them->remove Index column.

Zarko_Tripunovic
Copper Contributor
Jul 02, 2020
SergeiBaklan Thank You again for Your help. I used Table.Buffer and it works perfectly now. After You advised me to use this, I wasn't sure how to use it so I found it on https://exceleratorbi.com.au/remove-duplicates-keep-last-record-power-query/
- SergeiBaklan
  Diamond Contributor
  Jul 02, 2020
  Zarko_Tripunovic
  
  Sorry didn't explain Table.Buffer() in more details. But again, simplest way is to Add index->Sort->Remove index
  
  Anyway, glad to know you sorted the issue out.
  - truemh
    Copper Contributor
    Apr 02, 2021
    SergeiBaklan
    Thank you for the tip! I used buffer previously, but with lots of flat files and millions of rows refresh of a dataflow became rather slow. Below my initial timings with Table.Buffer / Sort / RemoveDuplicates.
    Bytes processed (KB) Max commit (KB) Processor Time
    273836 2134400 30:47.9
    As the production data set will be four times bigger and growing I got upset. But thanks to your advice same results were achieved faster and with less effort on the capacity side:
    Bytes processed (KB) Max commit (KB) Processor Time
    211547 1488424 08:57.5