how to identify duplicates in one column based on a common value in another column?

Copper Contributor

For quite a while an automation was adding duplicate contacts to a Project, so a person is named sometimes many times on a Project. I need to delete the duplicates (shown below in bold).  I have a two-column spreadsheet with:

  1. Project
  2. Project Contact(s) - ok to have multiple people, but never the same name more than once.

If a Project has more than one Project Contact, there is a row for each of those contacts, like this:

 

Red ProjectMary Jones
Red ProjectMary Jones
Red ProjectJim Stone

Blue Project

Bart Brown

Blue Project

Bart Brown

 

I know I need to sort by the Project, but how do I identify which names (B) are duplicate names on the same Project (A)? Is there a feature in Conditional Formatting or Remove Duplicates that will allow me to do this? I need something to flag the duplicate rows so I can delete them, but I don't know the tools well enough for this one. I've included a small sample of real data, thanks!

 

 

2 Replies
Back up your file before trying.

Select/highlight your table.
Click Data, Remove Duplicates.
Check the boxes for the columns that contain project name and person name (uncheck any others).
Click OK.

@sharphaus In order to see the duplicated rows, select your both column data range. Conditional Formatting/Highlight Cell rules/Duplicate Values/OK

All duplicated rows in your data range should be highlighted in red fill with red text (Important note: the highlighted rows help you to see which rows are duplicated but they should never been used for deletion because if you delete all highlighted rows you are also deleting the 1st occurrence of the duplicates)

 

If your dataset is small, you can follow the instructions in the other reply to remove the duplicated.

 

However, because duplicate removal cannot be undone, to ensure that I only remove the duplicated data when my dataset is large, I usually do this way:

1. Assume your data is A2:B7 (A1:B1 is heading)

2. In C1 type Count as the heading and in C2 type: =COUNTIFS(A2:$A$7,A2,B2:$B$7,B2)

Note that you need to make the ending cell of the data range in the above formula as absolute reference with $ sign, whereas the opening cell is relative reference without $ sign)

3. Add filter to the heading of A1:C1

4. Filter column C for any number larger than 1 (i.e. 2, 3, etc) then delete those filtered rows

5. Unfilter column C and now you should expect to see only the number 1 in that column, meaning any row shown is unique.