Forum Discussion
MaxIfIs vs Earliest vs Aggregation to Find Douplicates {Get&Transform}
- Aug 08, 2017
Hi.
After playing arround for it for a couple of days I found a quicker solution than #2, while using the Query Editor.
Hence here #4:
- Import your data and duplicate it.
- Remove all columuns in one data set up to the Uniqe ID and the "Date Created" of the source file.
- Use "GroupBy" within "Transform". Group by the ID, return Max of the Date.
- Select both columns, go to "Transform" and use "Merge Columns"
- Change the "Load To" to of this Grouped dataset to connection only without loading it to the datamodel.
- Go back to the other dataset that you intend to load into the Datamodel and merge the "Date created" and your Unique ID
- Make an Inner Join
I get a refresh time of less then one minute.
Still not the quick single Table solution without the use of "Join" I have been looking for, though at least it is within the "Query Editor", quick and keeps the file size small.
Ps.: You get "Date Created" at the very first ste of Applied Steps, under "Source"
Hi,
I don't know if other options exist, perhaps they are, as for me i'd take #2 and almost sure it'll be most efficient.
G&T is slow and inefficient to work with big arrays of data. Microsoft works on performance improving, but so far we have what we have.
Data model engine is initially very well optimized to work with large amounts of data, shall not be a big problem to keep duplicates.
Of course, many depends on how good is optimized your M script for #3 and data model structure with DAX formulas for #2.
- Elmar BischofAug 04, 2017Copper Contributor
Thank you for your insight. Indeed Number 2 is the fastest, though also creates problems of creating relationships afterwards within the Datamodel, as you'd run into having douplicates at what are suposed to unique ID's.
Personaly I am currently experimenting on changing the function that is showin in Aggregation and potentially have a Table.???({?}) function based on "List.Max" or "List.Min" of the Aggregation part.Like
Table.TransformColumnsRecord.TransformFields
(from https://msdn.microsoft.com/en-us/library/mt296614.aspx)
Here the Aggregation.= Table.AggregateTableColumn(#"Merged Queries", "NewColumn", {{"Date created", List.Max, "Max of NewColumn.Date created"}})
There must be a quicker way of doing it in G&T. Otherwhise sourcing from a foulder would be half as usefull.
- Elmar BischofAug 08, 2017Copper Contributor
Hi.
After playing arround for it for a couple of days I found a quicker solution than #2, while using the Query Editor.
Hence here #4:
- Import your data and duplicate it.
- Remove all columuns in one data set up to the Uniqe ID and the "Date Created" of the source file.
- Use "GroupBy" within "Transform". Group by the ID, return Max of the Date.
- Select both columns, go to "Transform" and use "Merge Columns"
- Change the "Load To" to of this Grouped dataset to connection only without loading it to the datamodel.
- Go back to the other dataset that you intend to load into the Datamodel and merge the "Date created" and your Unique ID
- Make an Inner Join
I get a refresh time of less then one minute.
Still not the quick single Table solution without the use of "Join" I have been looking for, though at least it is within the "Query Editor", quick and keeps the file size small.
Ps.: You get "Date Created" at the very first ste of Applied Steps, under "Source"