Forum Discussion

Biggy_E's avatar
Biggy_E
Copper Contributor
Mar 12, 2024

Help--Formulas

Hello!  I am seeking assistance creating a formula that can cross-reference two data sources.  So the issue is that one set of data has first and last names, and the second source has first name, middle name, and two last names.  Ex. Juan Penaranda on one file, Juan Jose Penaranda on another, but they are both the same person hence no anomalies.  However, I need to report the ones that are anomalies.

 

See the attached Excel file.

 

Any assistance in overcoming this issue would be greatly appreciated.

 

 

  • Biggy_E 

    Have you heard Fuzzy Lookup? It performs fuzzy matching of your two sets of data/table.

    1. Fuzzy Matching:
      • used to identify similar or approximately matching records in a dataset.
      • Unlike exact matches, where data must be identical, fuzzy matching allows for variations such as typos, misspellings, and similar phrases.
    2. Use Cases:
      • The Fuzzy Lookup Add-In can be used in two main scenarios:
        1. Duplicate Detection: Within a single table, it identifies fuzzy duplicate rows based on textual similarity.
        2. Data Integration: Between two different tables, it performs a fuzzy join to match similar rows.
    3. How It Works:
      • The add-in compares the data/text of specified columns (such as names, addresses, or product descriptions) using advanced algorithms.
      • It assigns a similarity score to each pair of records, indicating how closely they match.
      • You can then set a threshold to determine which pairs are considered matches.

    You can download this fuzzy lookup here>>> Download Fuzzy Lookup Add-In for Excel from Official Microsoft Download Center

    you can use this tutorial based on your query>>> How to Use Microsoft Excel Fuzzy Lookup - YouTube

     

    I've also attached the sample file using fuzzy lookup

     

  • Rodrigo_'s avatar
    Rodrigo_
    Steel Contributor

    Biggy_E 

    Have you heard Fuzzy Lookup? It performs fuzzy matching of your two sets of data/table.

    1. Fuzzy Matching:
      • used to identify similar or approximately matching records in a dataset.
      • Unlike exact matches, where data must be identical, fuzzy matching allows for variations such as typos, misspellings, and similar phrases.
    2. Use Cases:
      • The Fuzzy Lookup Add-In can be used in two main scenarios:
        1. Duplicate Detection: Within a single table, it identifies fuzzy duplicate rows based on textual similarity.
        2. Data Integration: Between two different tables, it performs a fuzzy join to match similar rows.
    3. How It Works:
      • The add-in compares the data/text of specified columns (such as names, addresses, or product descriptions) using advanced algorithms.
      • It assigns a similarity score to each pair of records, indicating how closely they match.
      • You can then set a threshold to determine which pairs are considered matches.

    You can download this fuzzy lookup here>>> Download Fuzzy Lookup Add-In for Excel from Official Microsoft Download Center

    you can use this tutorial based on your query>>> How to Use Microsoft Excel Fuzzy Lookup - YouTube

     

    I've also attached the sample file using fuzzy lookup

     

  • djclements's avatar
    djclements
    Bronze Contributor

    Biggy_E If you have Excel for MS365, the following dynamic array formula may help:

     

    =LET(
        data1, TRIM(B4:B39), data2, TRIM(C4:C39), delim, {" ","–"},
        LIKE, LAMBDA(array,criteria, BYROW(--ISNUMBER(SEARCH(criteria, array)), LAMBDA(r, PRODUCT(r)))),
        array1, FILTER(data1, (data1<>"")*NOT(MAP(data1, LAMBDA(n, SUM(LIKE(data2, TEXTSPLIT(n, delim)))))), ""),
        results1, FILTER(array1, NOT(MAP(array1, LAMBDA(v, SUM(MAP(data2, LAMBDA(n, LIKE(v, TEXTSPLIT(n, delim)))))))), ""),
        array2, FILTER(data2, (data2<>"")*NOT(MAP(data2, LAMBDA(n, SUM(LIKE(data1, TEXTSPLIT(n, delim)))))), ""),
        results2, FILTER(array2, NOT(MAP(array2, LAMBDA(v, SUM(MAP(data1, LAMBDA(n, LIKE(v, TEXTSPLIT(n, delim)))))))), ""),
        SORT(VSTACK(EXPAND(results1,, 2, "Data Source 1"), EXPAND(results2,, 2, "Data Source 2")))
    )

     

    It's a mouthful, to be sure. Basically, it performs a cross-directional lookup on both lists, returning the names from list 1 not found in list 2 and vice-versa. Then it stacks and sorts the results so it's easy to spot the spelling mistakes. Please note, if either of the lists contain additional special characters like commas, just add them to the delim array, for example: {" ","–",","}.

     

    Results

     

    As an alternative, you could also try:

     

    =LET(
        data1, TRIM(B4:B39), data2, TRIM(C4:C39), delim, {" ","–"},
        NOMATCH, LAMBDA(val,arr, AND(TOCOL(MAP(arr, LAMBDA(a, SUM(N(val=TEXTSPLIT(a,, delim)))<2)), 2))),
        results1, FILTER(data1, IFERROR(MAP(data1, LAMBDA(n, LET(v, TEXTSPLIT(n, delim), NOMATCH(v, data2)))), FALSE), ""),
        results2, FILTER(data2, IFERROR(MAP(data2, LAMBDA(n, LET(v, TEXTSPLIT(n, delim), NOMATCH(v, data1)))), FALSE), ""),
        SORT(VSTACK(EXPAND(results1,, 2, "Data Source 1"), EXPAND(results2,, 2, "Data Source 2")))
    )

     

    However, this method is less accurate than the first because it's only looking for two matching names within each full name. For example, if the first and middle names match, but the last name does not, it will still be considered a match.

     

    Please see the attached workbook...

Share

Resources