Forum Discussion

AsiaY's avatar
AsiaY
Copper Contributor
Jun 21, 2019

Text Filter

Hello All, 

 

Would you be able to tell me how to filter two columns of individual names for differences, that rule out middle initials? 

 

I've tried using a filter that would tell me if the names were essentially the same but I received "false" results including names that were only differentiated due to a middle initial... 

 

Any suggestions?

18 Replies

  • SergeiBaklan's avatar
    SergeiBaklan
    Diamond Contributor

    AsiaY 

     

    Sample is in Sheet3 attached, but I'm missed with the logic:

    - Are names always as LastName FirstName <MiddleName or Initial> where the latest is optional? Or that could be like LastName <MiddleName> FirstName?

     

    - Names as "Calendar Amy L" and "Calendar Amy Lynn" are considered as the same. But what about "Calendar Amy J" or whatever is on the third position - is that the same "Calendar Amy"?

    • AsiaY's avatar
      AsiaY
      Copper Contributor
      Yes, all names are listed last name first name middle initial suffix (if any) ...

      If the middle initial is different then yes it is a different person...

      I really appreciate your help ... let me know what you think.
      • SergeiBaklan's avatar
        SergeiBaklan
        Diamond Contributor

        AsiaY 

         

        One more question - if no initial / middle name this person consider as different to one who has no such? For example

        1) Calendar Amy

        2) Calendar Amy L

        3) Calendar Amy Lynn

        4) Calendar Amy H

         

        2) and 3) are the same. 4) is different from 2) and 3). What about 1), is it different from all of them or it's the same as 2),3) or same as 4) and if two latest options how we shall know which one to take.

         

  • SergeiBaklan's avatar
    SergeiBaklan
    Diamond Contributor

    For the collection, Power Query variant

    by

    let
        Source = Excel.CurrentWorkbook(){[Name="Names"]}[Content],
        RemoveMiddle = Table.AddColumn(Source, "Different Names", each
            let
                splitNames = Splitter.SplitTextByDelimiter(" ", QuoteStyle.None)([Names])
            in
                Text.Combine({splitNames{0}, " ", List.Last(splitNames)}),
        type text),
        RemoveDuplicates = Table.Distinct(RemoveMiddle, {"Different Names"}),
        RemoveSource = Table.SelectColumns(RemoveDuplicates,{"Different Names"})
    in
        RemoveSource

    Actually no coding is required, script is generated by Column By Example.

    • AsiaY's avatar
      AsiaY
      Copper Contributor

      SergeiBaklan 

       

      Hello Sergei, 

       

      Your example is great except I am needing to only find names that are different in two columns of names, where each name is listed side by side but could be listed different as Calendar Amy L or Calendar Amy Lynn … 

       

      Any suggestions? 

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      SergeiBaklan 

      I think you are correct in that PQ is the way to go with data analysis problems.  In particular I welcome the degree of structure offered by Tables and PQ that is missing from normal spreadsheet usage.

       

      Now though, DAs have hugely improved the usability of arrays for model building.  What is confusing is that the new DA functions also provide an alternative approach for data analysis steps (sorting, filtering etc). 

       

      There is now a big area of overlap where either methodology is viable.  I am not certain where the borderlines are in terms of which option should be recommended for what problems.  At the moment, it is a case of trying each and determining on a case-by-case basis which works out the better.

  • AsiaY 

    Just to demonstrate that my post above wasn't merely the product of a deranged mind, I attach a functioning workbook (standard installation rather than insider).  The name 'testName' is now a relative reference to a cell in the column header and the sequence 'k' is now somewhat more turgid

    k: =COLUMN( INDEX(aRow,1):INDEX(aRow,n))

    [not that a number sequence should rely upon the definition of a somewhat arbitrary range]

  • AsiaY 

    Warning: This is an experiment testing formulas using modern Dynamic Arrays.

     

    n: = LEN(@testName); select a name from the column headers and determine its length

    k: = SEQUENCE(1,n); define a sequence counter for characters

    chr: = MID( @testName, k, 1 ); split the test name into separate characters

    firstName: = LEFT( @testName, MIN( IF( chr=" ", k ) ) ); extract first name

    lastName: = RIGHT( @testName, n - MAX(IF( chr=" ", k ) ) ); extract last name

    Worksheet formula:

    = ( LEFT(fullName, LEN(firstName)) = firstName) *

       (RIGHT(fullName, LEN(lastName)) = lastName);

    compares an array of full names against the test name.

     

     Tom B. JonesJames A Smith
    Tom H Jones10
    James B. Jones00
    Tom B Jones10
    Chris Williams00
    James Smith01
    James X. L. Smith01
    • AsiaY's avatar
      AsiaY
      Copper Contributor

      PeterBartholomew1 

       

      Hi Peter, 

       

      Is there a way to extract the data without having to use test names? Each name in the two columns that I have are listed without any commas or other special characters as last name first name middle name/initial... 

       

      Hoping to find a way to rule out same names that have a middle name/initial listed... 

       

      For example, Hampton Sally E shows false to Hampton Sally Egg...

       

      Any thoughts? 

  • Haytham Amairah's avatar
    Haytham Amairah
    Silver Contributor

    AsiaY

     

    Hi,

     

    Please try this formula to compare two names regardless of the middle initial:

    =TRIM(LEFT(A1,FIND(" ",LOWER(A1),1))) & " " & TRIM(MID(A1,FIND(" ",LOWER(A1),FIND(" ",LOWER(A1),1)+1)+1,LEN(A1)-FIND(" ",LOWER(A1),1)+1))=TRIM(LEFT(B1,FIND(" ",LOWER(B1),1))) & " " & TRIM(MID(B1,FIND(" ",LOWER(B1),FIND(" ",LOWER(B1),1)+1)+1,LEN(B1)-FIND(" ",LOWER(B1),1)+1))

     

    The formula is found in this https://www.extendoffice.com/documents/excel/1779-excel-remove-middle-initial.html, I've applied it on two cells and compare them in terms of equality using = sign.

     

    Regards

    • AsiaY's avatar
      AsiaY
      Copper Contributor

      Haytham Amairah 

       

      Is there a way to tweak the formula to list names as last, first, middle initial? The formula works except it still shows false for names listed not exactly the same... for example Jones Santa C is false to Jones Santa...

       

      • Haytham Amairah's avatar
        Haytham Amairah
        Silver Contributor

        AsiaY

         

        It seems difficult to add this odd case to the formula, so I suggest to separate the names that have the initial in the last in a separate sheet and compare them using the below formula.

        Or alternatively, you can add it in the next column next to the previous formula as a second check.

         

        =IF(LEN(TRIM(A1))-LEN(TRIM(SUBSTITUTE(A1," ","")))+1=3,TRIM(SUBSTITUTE(A1,MID(A1,LEN(A1)-1,2),"")),A1)=IF(LEN(TRIM(B1))-LEN(TRIM(SUBSTITUTE(B1," ","")))+1=3,TRIM(SUBSTITUTE(B1,MID(B1,LEN(B1)-1,2),"")),B1)

         

        Hope that helps

Resources