Mar 12 2024 01:48 PM - edited Mar 12 2024 01:51 PM
Hello! I am seeking assistance creating a formula that can cross-reference two data sources. So the issue is that one set of data has first and last names, and the second source has first name, middle name, and two last names. Ex. Juan Penaranda on one file, Juan Jose Penaranda on another, but they are both the same person hence no anomalies. However, I need to report the ones that are anomalies.
See the attached Excel file.
Any assistance in overcoming this issue would be greatly appreciated.
Mar 12 2024 07:00 PM
Solution@Biggy_E
Have you heard Fuzzy Lookup? It performs fuzzy matching of your two sets of data/table.
You can download this fuzzy lookup here>>> Download Fuzzy Lookup Add-In for Excel from Official Microsoft Download Center
you can use this tutorial based on your query>>> How to Use Microsoft Excel Fuzzy Lookup - YouTube
I've also attached the sample file using fuzzy lookup
Mar 13 2024 01:21 AM - edited Mar 13 2024 05:35 AM
@Biggy_E If you have Excel for MS365, the following dynamic array formula may help:
=LET(
data1, TRIM(B4:B39), data2, TRIM(C4:C39), delim, {" ","–"},
LIKE, LAMBDA(array,criteria, BYROW(--ISNUMBER(SEARCH(criteria, array)), LAMBDA(r, PRODUCT(r)))),
array1, FILTER(data1, (data1<>"")*NOT(MAP(data1, LAMBDA(n, SUM(LIKE(data2, TEXTSPLIT(n, delim)))))), ""),
results1, FILTER(array1, NOT(MAP(array1, LAMBDA(v, SUM(MAP(data2, LAMBDA(n, LIKE(v, TEXTSPLIT(n, delim)))))))), ""),
array2, FILTER(data2, (data2<>"")*NOT(MAP(data2, LAMBDA(n, SUM(LIKE(data1, TEXTSPLIT(n, delim)))))), ""),
results2, FILTER(array2, NOT(MAP(array2, LAMBDA(v, SUM(MAP(data1, LAMBDA(n, LIKE(v, TEXTSPLIT(n, delim)))))))), ""),
SORT(VSTACK(EXPAND(results1,, 2, "Data Source 1"), EXPAND(results2,, 2, "Data Source 2")))
)
It's a mouthful, to be sure. Basically, it performs a cross-directional lookup on both lists, returning the names from list 1 not found in list 2 and vice-versa. Then it stacks and sorts the results so it's easy to spot the spelling mistakes. Please note, if either of the lists contain additional special characters like commas, just add them to the delim array, for example: {" ","–",","}.
As an alternative, you could also try:
=LET(
data1, TRIM(B4:B39), data2, TRIM(C4:C39), delim, {" ","–"},
NOMATCH, LAMBDA(val,arr, AND(TOCOL(MAP(arr, LAMBDA(a, SUM(N(val=TEXTSPLIT(a,, delim)))<2)), 2))),
results1, FILTER(data1, IFERROR(MAP(data1, LAMBDA(n, LET(v, TEXTSPLIT(n, delim), NOMATCH(v, data2)))), FALSE), ""),
results2, FILTER(data2, IFERROR(MAP(data2, LAMBDA(n, LET(v, TEXTSPLIT(n, delim), NOMATCH(v, data1)))), FALSE), ""),
SORT(VSTACK(EXPAND(results1,, 2, "Data Source 1"), EXPAND(results2,, 2, "Data Source 2")))
)
However, this method is less accurate than the first because it's only looking for two matching names within each full name. For example, if the first and middle names match, but the last name does not, it will still be considered a match.
Please see the attached workbook...
Mar 12 2024 07:00 PM
Solution@Biggy_E
Have you heard Fuzzy Lookup? It performs fuzzy matching of your two sets of data/table.
You can download this fuzzy lookup here>>> Download Fuzzy Lookup Add-In for Excel from Official Microsoft Download Center
you can use this tutorial based on your query>>> How to Use Microsoft Excel Fuzzy Lookup - YouTube
I've also attached the sample file using fuzzy lookup