Forum Discussion
Combining data from multiple rows into a single row taking earliest date
I have a data set that looks like this:
| ID | Name | Driver's Liscence | Submission Date | Approval Date |
| 341 | Kenner, Jen | 32214 | 9/16/19 | 9/15/19 |
| 318 | Kenner, Jen | 32214 | 9/18/19 | 9/19/19 |
| 302 | Kenner, Jen | 32214 | 9/19/19 |
I want to combine the rows with the same name despite different IDs and take the earliest submission date (i.e. 9/16/19) and the oldest approval date (i.e. 9/19/19) and have them appear as one row (condensing the info). So maybe it would look something like this:
| ID_new | Name | Driver's Liscence | Submission Date_new | Approval Date_new |
| 341318302(?, or generating a whole new ID #) | Kenner, Jen | 32214 | 9/16/19 | 9/19/19 |
Any help is greatly appreciated!!
2 Replies
- SergeiBaklanDiamond Contributor
For such model
If with formulas:
Select unique names
=IFERROR(INDEX($C$4:$C$7,AGGREGATE(15,6,1/(COUNTIF($I$3:$I3,$C$4:$C$7)=0)*(ROW($C$4:$C$7)-ROW($C$3)),1)),"")Driver license
=IFNA(INDEX($D$4:$D$7,MATCH($I4,$C$4:$C$7,0)),"")Submission date (earliest in list)
=IFERROR(AGGREGATE(15,6,1/($C$4:$C$7=$I4)*$E$4:$E$7,1),"")Approval date (latest one)
=IFERROR(AGGREGATE(14,6,1/($C$4:$C$7=$I4)*$F$4:$F$7,1),"")Combined ID:
=TEXTJOIN("",TRUE,IF($C$4:$C$7=$I4,$B$4:$B$7,""))and drag all down till empty cells appear.
If with Power Query:
Query the list, Group by Names with proper aggregations on other columns, return back adding some cosmetic:
let Source = Excel.CurrentWorkbook(), SourceList = Source{[Name="SourceList"]}[Content], PromotHeaders = Table.PromoteHeaders( SourceList, [PromoteAllScalars=true] ), FilterNulls = Table.SelectRows( PromotHeaders, each ([Name] <> null) ), ChangeType = Table.TransformColumnTypes( FilterNulls, { {"ID", type text}, {"Name", type text}, {"Driver's Liscence", type text}, {"Submission Date", type date}, {"Approval Date", type date} } ), GroupNames = Table.Group( ChangeType, {"Name"}, { {"ID", each Text.Combine([ID]), type text}, {"Driver's Liscence", each List.First([#"Driver's Liscence"]), type text}, {"Submission Date", each List.Min([Submission Date]), type date}, {"Approval Date", each List.Max([Approval Date]), type date} } ), ReorderColumns = Table.ReorderColumns( GroupNames, {"ID", "Name", "Submission Date", "Approval Date"} ) in ReorderColumns - mathetesGold Contributor
Could you upload an abbreviated version of your actual spreadsheet, so we don't need to recreate it?
Give us also an idea of the scope of your data: how many rows, how many duplicate names, typically?
An observation: one of the general rules of good database design is to never make "name" a field that includes first and last; rather, make each name a field of its own. It's too late for that now, but if you can at the same time break that field into two (or more, if you've got suffixes or prefixes) then do it. It's far easier to assemble two names into one--for printing output--than it is to break them apart.