Excel help - how to find the differences between two similar data sheets

Copper Contributor

Hi, I hope everyone is doing GREAT.

 

What formula will find the differences between two data sheets with partial matches.  The data columns to compare do not match, meaning one set of data uses full naming conventions while the other data set uses partial naming conventions.  Example below

 

  • i.e.  ADVENT CAPITAL 21 MISSION LP B
  • i.e.  Advent Capital 21 Mission LP by way of its General Partner, Advent Development Corporation

Below are two samples from the HUGE data set that needs to be compared.

 

Data set 1   

Data set - 1

Blue Brentwood LTD.

Hoover's Watercarts Incorporated

Bella's Foundation Corp.

Lylia's Flower Emporium

MANSFIELD COOKIES LTD. 910111213 CANADA INC.

Brightside Inventions Inc.

Howard's Plate Incorporated

Saskatoon Marbles Limited

10010 KITIMAT ROAD LIMITED PART

BARRELS FUN ONTARIO LTD

123546 ONTARIO INC.

AAA Stops Canada., Inc.

32158411 Quebec Incorporated

Kats & Dogs QUE Incorporated

 

Data set 2. 

Data set - 2

Howard's Plate Inc.

AAA Stops Cda., Inc.

Hoover's Watercarts Inc.

Saskatoon Marbles Ltd.

Bella's Foundation Corporation

Lylia's Flower Emporium Inc.

MANSFIELD COOKIES LIMITED

Brightside Inventions Inc.

Blue Brentwood Limited

10010 KITIMAT ROAD LTD., PART

Barrels FUN ON., LTD

123546 ONT INC.

910111213 CAN INC.

32158411 QUEBEC Inc.

Kats & Dogs QUE INC.

 

I have been struggling with this for 2 weeks... I sure hope someone can help with this :smiling_face_with_smiling_eyes:

 

Thank you,

Carol369

 

 

2 Replies

@Carol369 

This is a difficult task because any time partial matching is involved one cannot guarantee 100% accuracy.

 

This 365 solution takes a given text and goes through a progression of checking a character accumulation string against list 1. For example, with Howard's Plate Inc. it checks:

H

Ho

How

Howa

Howar

Howard

Howard'

Howard's

Howard's P

Howard's Pl

Howard's Pla

Howard's Plat

Howard's Plate

Howard's Plate I

Howard's Plate In

Howard's Plate Inc

Howard's Plate Inc.


Naturally, with using partial matching it will easily find only a few letters in list 1.  Max is added to select the highest number match from the list to ensure optimal accuracy.

The formula includes an accuracy check. You may specify how much of the string to match.

 

 

=LAMBDA(input,acc,LET(strings,MID(input,SEQUENCE(LEN(input)),1),s,SCAN("",strings,LAMBDA(a,v,a&v)),m,MAX(IFERROR(XMATCH(s&"*",List1,2),0)),IF(m>=acc*LEN(input),m,0)))

 

The above Lambda is then folded into a smaller formula:

The .4 indicates how much of the string will be checked against list 1. You can adjust this as needed to get more results.

 

 

=BYROW(list2,LAMBDA(row,IDrow(row,0.4)))

 

I'm familiar with partial matching techniques using LEFT, RIGHT, MID and wild cards.  I've seen data entered differently in two lists and I know it can be a time sink. Hopefully, this formula can be put to use.