Forum Discussion

DRothwein's avatar
DRothwein
Copper Contributor
Feb 26, 2021

Consolidate duplicates in power query

I have successfully combined two different sources of data that have multiple duplicates.  When I remove duplicates I lose the data from the secondary sheet that I used in the query.  I want to consolidate my duplicates with all data from both sheets.  Is this possible?  

3 Replies

  • SergeiBaklan's avatar
    SergeiBaklan
    Diamond Contributor

    DRothwein 

    Is it possible to illustrate on small sample file?

    It's not clear under "combined two different sources" you mean appending or merging. If merging, which one?

    "Duplicates" means exactly the same values in all columns or in few or only ID:s?

    Depends on that what do you mean under consolidation, which shall be the result?

     

    • DRothwein's avatar
      DRothwein
      Copper Contributor

      Here is an example where i have appended together the two different sheets. The first two columns have 97 duplicates. You can see where the information from the first spread sheet is in the first few columns and then the second spread sheet information is in the duplicate row. I would like the second spread sheet information to be in the same row and consolidate the data but its not letting me do this...






      • SergeiBaklan's avatar
        SergeiBaklan
        Diamond Contributor

        DRothwein 

        I'm not sure why you append, not merge. Let take such sample

        Desired result is

        We could receive it by merging one cell with another and expanding resulting column with the table. Generated script is

        let
            Source = Table.NestedJoin(Table1, {"ID"}, Table2, {"ID"}, "Table2", JoinKind.LeftOuter),
            #"Expanded Table2" = Table.ExpandTableColumn(Source, "Table2", {"Three", "Four"}, {"Three", "Four"})
        in
            #"Expanded Table2"

        If append replace blank values on null (not sure why you don't have it by default), sort by ID, Fill Up and Fill Down proper columns, select all and remove duplicates. This way is not very reliable, depends on your concrete data. Script could be like

        let
            Source = Table.Combine({Table1, Table2}),
            #"Sorted Rows" = Table.Sort(Source,{{"ID", Order.Ascending}}),
            #"Filled Down" = Table.FillDown(#"Sorted Rows",{"One", "Two"}),
            #"Filled Up" = Table.FillUp(#"Filled Down",{"Three", "Four"}),
            #"Removed Duplicates" = Table.Distinct(#"Filled Up")
        in
            #"Removed Duplicates"

Resources