Forum Discussion
MaxIfIs vs Earliest vs Aggregation to Find Douplicates {Get&Transform}
- Aug 08, 2017
Hi.
After playing arround for it for a couple of days I found a quicker solution than #2, while using the Query Editor.
Hence here #4:
- Import your data and duplicate it.
- Remove all columuns in one data set up to the Uniqe ID and the "Date Created" of the source file.
- Use "GroupBy" within "Transform". Group by the ID, return Max of the Date.
- Select both columns, go to "Transform" and use "Merge Columns"
- Change the "Load To" to of this Grouped dataset to connection only without loading it to the datamodel.
- Go back to the other dataset that you intend to load into the Datamodel and merge the "Date created" and your Unique ID
- Make an Inner Join
I get a refresh time of less then one minute.
Still not the quick single Table solution without the use of "Join" I have been looking for, though at least it is within the "Query Editor", quick and keeps the file size small.
Ps.: You get "Date Created" at the very first ste of Applied Steps, under "Source"
Thank you for your insight. Indeed Number 2 is the fastest, though also creates problems of creating relationships afterwards within the Datamodel, as you'd run into having douplicates at what are suposed to unique ID's.
Personaly I am currently experimenting on changing the function that is showin in Aggregation and potentially have a Table.???({?}) function based on "List.Max" or "List.Min" of the Aggregation part.
Like
Table.TransformColumns
Record.TransformFields
(from https://msdn.microsoft.com/en-us/library/mt296614.aspx)
Here the Aggregation.
= Table.AggregateTableColumn(#"Merged Queries", "NewColumn", {{"Date created", List.Max, "Max of NewColumn.Date created"}})
There must be a quicker way of doing it in G&T. Otherwhise sourcing from a foulder would be half as usefull.
Hi.
After playing arround for it for a couple of days I found a quicker solution than #2, while using the Query Editor.
Hence here #4:
- Import your data and duplicate it.
- Remove all columuns in one data set up to the Uniqe ID and the "Date Created" of the source file.
- Use "GroupBy" within "Transform". Group by the ID, return Max of the Date.
- Select both columns, go to "Transform" and use "Merge Columns"
- Change the "Load To" to of this Grouped dataset to connection only without loading it to the datamodel.
- Go back to the other dataset that you intend to load into the Datamodel and merge the "Date created" and your Unique ID
- Make an Inner Join
I get a refresh time of less then one minute.
Still not the quick single Table solution without the use of "Join" I have been looking for, though at least it is within the "Query Editor", quick and keeps the file size small.
Ps.: You get "Date Created" at the very first ste of Applied Steps, under "Source"