Find duplicates of 2 columns

Brass Contributor

I cant seem to find anything specific to my question on finding duplicates so here goes

 

I have a large spreadsheet of 80000+ rows and maybe 20 columns.  Sample of what the first 6 columns look like below.  I need to find duplicates of Col1 and Col5 together.  For example bit col2 and col5 will have hundreds of duplicate cells but there should be very little cases where both are duplicated.

 

Row 4 and 5 is an example where both cells are duplicate.  How do I run a search over the entire workbook

 

Col1Col2Col3Col4Col5Col6
localServer1local1.2.3.4375630TRUE
localServer1local1.2.3.4375442TRUE
localServer1local1.2.3.4370588TRUE
localServer1local1.2.3.4370183TRUE
localServer1local1.2.3.4370183TRUE
localServer1local1.2.3.4122807TRUE
4 Replies
Hi,
I am sure there is a much smarter formulaic method to do this but here is a crude way to do it. If you were to create a column 7 which is a concat of column 1 and column 5 (assuming you are looking for duplicates where col 1 and col 5 match). Then use conditional formatting in column 7 to look for duplicates. The rows that are then highlighted with your conditional formatting colors are rows with duplicate values. Again I am sure there is a much better way to accomplish this.

Best,
Bp

@Bipin-prakash Thanks.  That would definitely work for my immediate needs.  I'll try that until anyone else has another way of doing this that might be a little more straightforward.

@lfk73 

An alternative:

Select the data range. In the following, I will assume that it starts in row 2 and that the active cell in the selection is in row 2, for example A2.

On the Home tab of the ribbon, select Conditional Formatting > New Rule...

Select 'Use a formula to determine which cells to format'.

Enter the formula

=COUNTIFS($A:$A,$A2,$E:$E,$E2)>1

(A and E are the columns you want to look for duplicates in).

Click Format...

Activate the Fill tab.

Select a color.

Click OK, then click OK again.

S1566.png

If you turn on Filter, you can filter by color to show only the duplicate rows:

S1567.png

@lfk73 

This also uses COUNTIFS but to prepare a report rather than highlighting cells.  Firstly, the process of testing every row against every other row is computationally very intensive.  The order in which the tests are performed is important for larger datasets.  

Since I use 365, I was able to filter the data, retaining rows with multiple matches for columns 5 and 1.  Where the records are complete duplicates, UNIQUE gives a single record for each.  If other properties differ both records are retained.

image.png

= LET(
    duplicated?, COUNTIFS(Column5,Column5,Column1,Column1) > 1,
    SORT(
      UNIQUE(FILTER(DataTbl, duplicated?)),
      {1,5},
    1)
  )