Forum Discussion

Bob_lponge's avatar
Bob_lponge
Copper Contributor
Mar 02, 2022

Finding duplicates

Is there a way to find duplicates based on initials and date of birth?

For example:

 

Joe Bloggs 16/01/1977
Susan Smith 14/4/1987

Brian Red 07/08/1957

Jake Boss 16/01/1977

 

So a formula that would highlight Joe Bloggs and Jake Boss as potential duplicates? Due to the J, B and date of birth matching?

9 Replies

  • Bob_lponge 

    =SUMPRODUCT(N(LEFT($A$1:$A$4,1)=LEFT(A1,1))*(MID($A$1:$A$4,FIND(" ",$A$1:$A$4)+1,1)=MID(A1,FIND(" ",A1)+1,1))*(RIGHT($A$1:$A$4,LEN($A$1:$A$4)-FIND(" ",$A$1:$A$4,FIND(" ",$A$1:$A$4)+1))=RIGHT(A1,LEN(A1)-FIND(" ",A1,FIND(" ",A1)+1))))>1

    An alternative could be this rule for conditional formatting if name and birthday are in one cell. 

    • mtarler's avatar
      mtarler
      Silver Contributor

      OliverScheurich  you should be able to simplify that based on the date format being exactly 10 characters (and combining the terms into a single comparison string):

       

       

       

      =SUMPRODUCT((LEFT($A:$A,1)&MID($A:$A,FIND(" ",$A:$A)+1,1)&RIGHT($A:$A,10))=
        (LEFT(A1,1)&MID(A1,FIND(" ",A1)+1,1)&RIGHT(A1,10)))>1

       

  • Bob_lponge 

    Let's say the names are in A2 and down, and the birth dates in B2 and down.

    If necessary, insert an empty column in column C.

    Enter the following formula in C2:

    =LEFT(A2)&MID(A2,FIND(" ",A2)+1,1)&B2

    You'll see something like JB28141. 28141 is the numeric representation of 16/01/1977.

    Fill down.

    Select column C.

    On the Home tab of the ribbon, select Conditional Formatting > Highlight Cells Rules > Duplicate Values... and click OK.

  • mtarler's avatar
    mtarler
    Silver Contributor

    so are those all in 1 cell, 2 cells or 3 separate cells? Lets assume 3 cells (and if not you could make them 3 cells using Text to Columns tool under the Data tab). Then under conditional formatting -> user your own formula:
    =SUMPRODUCT((LEFT($A1,1)&LEFT($B1,1)&$C1) = (LEFT($A:$A,1)&LEFT($B:$B,1)&$C:$C))>1

    • Bob_lponge's avatar
      Bob_lponge
      Copper Contributor
      Hi, sorry yes the characters are in 3 separate cells with headings. Are you inputting the formula into highlight cell rules and then more rules? It doesn’t seem to do anything when I input the formula
      • mtarler's avatar
        mtarler
        Silver Contributor

        I'm sorry, I just noticed I left the SUMPRODUCT() >1 off (thx Quadruple). try it now:
        =SUMPRODUCT(--((LEFT($A1,1)&LEFT($B1,1)&$C1) = (LEFT($A:$A,1)&LEFT($B:$B,1)&$C:$C)))>1
        but next question is if those dates are hand entered as a date or just text. If they are a date is there any chance they have a time also in which case you may need to replace each reference to C with INT($C1) to strip off the fractional time portion
        =SUMPRODUCT(
        (LEFT($A1,1)&LEFT($B1,1)&INT($C1)) = (LEFT($A:$A,1)&LEFT($B:$B,1)&INT($C:$C))
        )>1

Resources