Forum Discussion
Carol369
Aug 23, 2022Copper Contributor
Excel help - how to find the differences between two similar data sheets
Hi, I hope everyone is doing GREAT.
What formula will find the differences between two data sheets with partial matches. The data columns to compare do not match, meaning one set of data uses full naming conventions while the other data set uses partial naming conventions. Example below
- i.e. ADVENT CAPITAL 21 MISSION LP B
- i.e. Advent Capital 21 Mission LP by way of its General Partner, Advent Development Corporation
Below are two samples from the HUGE data set that needs to be compared.
Data set 1
Data set - 1 |
Blue Brentwood LTD. |
Hoover's Watercarts Incorporated |
Bella's Foundation Corp. |
Lylia's Flower Emporium |
MANSFIELD COOKIES LTD. 910111213 CANADA INC. |
Brightside Inventions Inc. Howard's Plate Incorporated |
Saskatoon Marbles Limited 10010 KITIMAT ROAD LIMITED PART |
BARRELS FUN ONTARIO LTD |
123546 ONTARIO INC. AAA Stops Canada., Inc. |
32158411 Quebec Incorporated |
Kats & Dogs QUE Incorporated |
Data set 2.
Data set - 2 |
Howard's Plate Inc. |
AAA Stops Cda., Inc. |
Hoover's Watercarts Inc. |
Saskatoon Marbles Ltd. |
Bella's Foundation Corporation |
Lylia's Flower Emporium Inc. |
MANSFIELD COOKIES LIMITED |
Brightside Inventions Inc. |
Blue Brentwood Limited |
10010 KITIMAT ROAD LTD., PART |
Barrels FUN ON., LTD |
123546 ONT INC. |
910111213 CAN INC. |
32158411 QUEBEC Inc. |
Kats & Dogs QUE INC. |
I have been struggling with this for 2 weeks... I sure hope someone can help with this 😊
Thank you,
Carol369
- Patrick2788Silver Contributor
This is a difficult task because any time partial matching is involved one cannot guarantee 100% accuracy.
This 365 solution takes a given text and goes through a progression of checking a character accumulation string against list 1. For example, with Howard's Plate Inc. it checks:
H
Ho
How
Howa
Howar
Howard
Howard'
Howard's
Howard's P
Howard's Pl
Howard's Pla
Howard's Plat
Howard's Plate
Howard's Plate I
Howard's Plate In
Howard's Plate Inc
Howard's Plate Inc.
Naturally, with using partial matching it will easily find only a few letters in list 1. Max is added to select the highest number match from the list to ensure optimal accuracy.The formula includes an accuracy check. You may specify how much of the string to match.
=LAMBDA(input,acc,LET(strings,MID(input,SEQUENCE(LEN(input)),1),s,SCAN("",strings,LAMBDA(a,v,a&v)),m,MAX(IFERROR(XMATCH(s&"*",List1,2),0)),IF(m>=acc*LEN(input),m,0)))
The above Lambda is then folded into a smaller formula:
The .4 indicates how much of the string will be checked against list 1. You can adjust this as needed to get more results.
=BYROW(list2,LAMBDA(row,IDrow(row,0.4)))
I'm familiar with partial matching techniques using LEFT, RIGHT, MID and wild cards. I've seen data entered differently in two lists and I know it can be a time sink. Hopefully, this formula can be put to use.
- Carol369Copper ContributorThank you.