Forum Discussion

TheCharWest's avatar
TheCharWest
Copper Contributor
Oct 07, 2021
Solved

Comparing Duplicates in Column A with Duplicates in Column B

Hello! I'm hoping there's a way to do this but I haven't figured it out yet so any help would be greatly appreciated. 

 

I have two columns of data which have duplicate values in:

FruitQuantity
Apple10
Apple10
Apple10
Banana5
Banana5
Banana4
Orange10
Orange3

I'd like to be able to identify whether the duplicate rows in column A match up to the duplicate rows in column B. So, in my example, I'd like to be able to pick out "Apple" and "10" as a duplicate in both columns at the same time. As "Orange" also has a value of 10, it's not just a case of looking at the duplicates in both columns separately.

 

To make things more complicated, I also want to exclude "Bananas" from this as not all of its quantity values are duplicates, only two of them are. I'm unsure how best to do this. 

 

Any help would be amazing! Thanks in advance!

  • I solved it using the following formula:
    =AND(COUNTIFS($A$2:$A$11,A2,$B$2:$B$11,B2)>1,C2="Primary")

     

    Thanks for the help everyone!

7 Replies

  • TheCharWest's avatar
    TheCharWest
    Copper Contributor

    I solved it using the following formula:
    =AND(COUNTIFS($A$2:$A$11,A2,$B$2:$B$11,B2)>1,C2="Primary")

     

    Thanks for the help everyone!

  • TheCharWest 

    I returned the result because Apple is the only fruit not to feature in multiple distinct records.

    = LET(
        FruitFreq,  COUNTIFS(Fruit, Fruit),
        RecordFreq, COUNTIFS(Fruit, Fruit, Quantity, Quantity),
        UNIQUE(FILTER(Data, RecordFreq=FruitFreq))
      )
  • TheCharWest 

    I understand you want to identify fruits which have only duplicates, like "Apples" and "10" in your example. Enter formula in D1 and copy down:

    =COUNTIF($A$1:$A$8,A1)=COUNTIFS($A$1:$A$8,A1,$B$1:$B$8,B1)

    Please compare / check solution in attached file.

    • TheCharWest's avatar
      TheCharWest
      Copper Contributor

      Hi SergeiBaklan and OliverScheurich - you have both given me answers which are actually really close to what I need, but I realised there's another complication to my data. I would actually like to pull out "Banana" as "True" as well, but only because its duplicate values are the highest values. If there was another "Orange - 5", I wouldn't want to include it as "Orange - 10" is higher.

      Essentially, my data looks like this:
      Fruit Quantity Primary
      Apple 10 Primary
      Apple 10 Primary
      Apple 10 Primary
      Banana 5 Primary
      Banana 5 Primary
      Banana 4 Duplicate
      Orange 10 Primary
      Orange 3 Duplicate
      Orange 3 Duplicate

      And I'd like to pick out where there is more than one Primary row per duplicate fruit. The primary record is the duplicate with the highest quantity score.

      • SergeiBaklan's avatar
        SergeiBaklan
        Diamond Contributor

        TheCharWest 

        Sorry, I didn't catch. Would you like "Banana 4 Duplicate" also shall be included? Perhaps you may add column within your file with results as they shall be in accordance to your logic.

Resources