Consolidate duplicates in power query

Copper Contributor

I have successfully combined two different sources of data that have multiple duplicates.  When I remove duplicates I lose the data from the secondary sheet that I used in the query.  I want to consolidate my duplicates with all data from both sheets.  Is this possible?  

3 Replies

@DRothwein 

Is it possible to illustrate on small sample file?

It's not clear under "combined two different sources" you mean appending or merging. If merging, which one?

"Duplicates" means exactly the same values in all columns or in few or only ID:s?

Depends on that what do you mean under consolidation, which shall be the result?

 

Here is an example where i have appended together the two different sheets. The first two columns have 97 duplicates. You can see where the information from the first spread sheet is in the first few columns and then the second spread sheet information is in the duplicate row. I would like the second spread sheet information to be in the same row and consolidate the data but its not letting me do this...

Snip it.PNG






@DRothwein 

I'm not sure why you append, not merge. Let take such sample

image.png

Desired result is

image.png

We could receive it by merging one cell with another and expanding resulting column with the table. Generated script is

let
    Source = Table.NestedJoin(Table1, {"ID"}, Table2, {"ID"}, "Table2", JoinKind.LeftOuter),
    #"Expanded Table2" = Table.ExpandTableColumn(Source, "Table2", {"Three", "Four"}, {"Three", "Four"})
in
    #"Expanded Table2"

If append replace blank values on null (not sure why you don't have it by default), sort by ID, Fill Up and Fill Down proper columns, select all and remove duplicates. This way is not very reliable, depends on your concrete data. Script could be like

let
    Source = Table.Combine({Table1, Table2}),
    #"Sorted Rows" = Table.Sort(Source,{{"ID", Order.Ascending}}),
    #"Filled Down" = Table.FillDown(#"Sorted Rows",{"One", "Two"}),
    #"Filled Up" = Table.FillUp(#"Filled Down",{"Three", "Four"}),
    #"Removed Duplicates" = Table.Distinct(#"Filled Up")
in
    #"Removed Duplicates"