Forum Discussion

Copper Contributor

Aug 04, 2017

Solved

MaxIfIs vs Earliest vs Aggregation to Find Douplicates {Get&Transform}

After having a view looks on variouse forums I found that I was not the first with the same data filtering requirement of my source data. Case: You have a Dataset where you need to append freque...

BI & Data Analytics

Requests

Elmar Bischof
Aug 08, 2017
Hi.

After playing arround for it for a couple of days I found a quicker solution than #2, while using the Query Editor.

Hence here #4:
Import your data and duplicate it.
Remove all columuns in one data set up to the Uniqe ID and the "Date Created" of the source file.
Use "GroupBy" within "Transform". Group by the ID, return Max of the Date.
Select both columns, go to "Transform" and use "Merge Columns"
Change the "Load To" to of this Grouped dataset to connection only without loading it to the datamodel.
Go back to the other dataset that you intend to load into the Datamodel and merge the "Date created" and your Unique ID
Make an Inner Join
I get a refresh time of less then one minute.
Still not the quick single Table solution without the use of "Join" I have been looking for, though at least it is within the "Query Editor", quick and keeps the file size small.

Ps.: You get "Date Created" at the very first ste of Applied Steps, under "Source"

Elmar Bischof

Copper Contributor

Aug 04, 2017

Thank you for your insight. Indeed Number 2 is the fastest, though also creates problems of creating relationships afterwards within the Datamodel, as you'd run into having douplicates at what are suposed to unique ID's.

Personaly I am currently experimenting on changing the function that is showin in Aggregation and potentially have a Table.???({?}) function based on "List.Max" or "List.Min" of the Aggregation part.

Like
Table.TransformColumns

Record.TransformFields

(from https://msdn.microsoft.com/en-us/library/mt296614.aspx)

Here the Aggregation.

= Table.AggregateTableColumn(#"Merged Queries", "NewColumn", {{"Date created", List.Max, "Max of NewColumn.Date created"}})

There must be a quicker way of doing it in G&T. Otherwhise sourcing from a foulder would be half as usefull.

Elmar Bischof

Copper Contributor

Aug 08, 2017

Hi.

After playing arround for it for a couple of days I found a quicker solution than #2, while using the Query Editor.

Hence here #4:

Import your data and duplicate it.
Remove all columuns in one data set up to the Uniqe ID and the "Date Created" of the source file.
Use "GroupBy" within "Transform". Group by the ID, return Max of the Date.
Select both columns, go to "Transform" and use "Merge Columns"
Change the "Load To" to of this Grouped dataset to connection only without loading it to the datamodel.
Go back to the other dataset that you intend to load into the Datamodel and merge the "Date created" and your Unique ID
Make an Inner Join

I get a refresh time of less then one minute.

Still not the quick single Table solution without the use of "Join" I have been looking for, though at least it is within the "Query Editor", quick and keeps the file size small.

Ps.: You get "Date Created" at the very first ste of Applied Steps, under "Source"