Fuzzy Matching in Power Query for Microsoft Excel
Published Oct 08 2019 01:07 PM 26.7K Views
Brass Contributor

Imagine this. We have been given a list of products from company A and a list of products from company B.

 

Our job is to match products between the two lists. But each company might have slightly different names for the same item.

 

This means we can't just use a lookup function because we don't have exact matches. We can't even use wildcards in a lookup function because the product names aren't different in a predictable way.

 

This is where we realize we have to manually match our products from the two lists and eventually we give up and start to cry! Right?

 

No, because we now have a fuzzy match option in Power Query. Lazy people rejoice!

 

Not sure what Power Query is? Then check out this Introduction to Power Query beforehand.

 

Example

01 List of Products.png

 

In our example, we have List A and List B. These are small and only have the one column, but imagine we're working with two large datasets with hundreds or thousands or products and many other columns.

 

We can see that none of the items in each list is an exact match, but they are likely the same product.

 

For example, in list A we have "Macaroni and cheese", but in list B we have "Macaroni & Cheese". They are the same thing but slightly different.

 

Create A Query For Both Tables

Both these lists have been converted into Excel Tables named ListA and ListB. The first thing we will need to do is create a query for each table.

  1. Select a cell inside the table.
  2. Go to the Data tab.
  3. Use the From Table/Range query command.
  4. Go to the Home tab in the Power Query editor.
  5. Select Close & Load ➜ Close & Load To.
  6. Choose Only Create Connection from the Import Data menu.

Once this is done for both tables, we'll be able to merge (join) the tables in another query.

 

Create A Merge Query Using Both Tables

Power Query has an option to merge tables.

 

This allows us to join data in two tables based on a common field like an ID column, or in our case the Product name column.

 

This is where we'll also find the new fuzzy match feature.

 

02 Merge Queries.png

 

Create a Merge query.

  1. Go to the Data tab.
  2. Choose Get Data from the ribbon.
  3. Choose Combine Queries from the menu.
  4. Choose Merge.

This will open up the Merge query menu.

 

Merge Menu Settings

 

03 Merge Menu.png

 

The Merge menu is where we can enable fuzzy matching.

  1. Select the first query we want to merge. For us, this will be ListA. We also need to select the column to base the merge on, for us this is the Product name column.
  2. Select the second query we want to merge and the column to base the merge on.
  3. We also need to select what type of join to perform. There are several choice, but we'll stick with the default option to return all items from the first list and only matching items from the second list. This is also where we can select fuzzy matching as an option, check the Use fuzzy matching to perform the merge box.

Notice at the bottom of the merge menu, Power Query tells us how many matches are found. When we enable fuzzy matches, this goes from 0 of 8 to 2 of 8.

 

Fuzzy Matching Options

We've already improved our matching by just enabling the fuzzy matching option. But there are more settings we can use to help improve the matching.

 

04 Fuzzy Matching Options.png

 

Click on the collapsed Fuzzy matching options and more advanced settings will be revealed. These can help us match more items in our lists.

 

Similarity Threshold

There's a similarity threshold we can adjust to help match more items. This is a value between 0 and 1 that indicates how similar values need to be in order to match.

  • 0 means everything will match.
  • 1 means only exact matches will match.

The default value for this will be 0.8.

 

Decreasing the threshold to 0.5 will take us from 2 of 8 matches to 5 of 8 matches.

 

Decreasing the threshold further to 0.3 will get us all 8 items matching.

 

But remember, the lower the threshold, the higher the likelihood of getting unwanted matches.

 

Ignore Case

Power Query is case sensitive. This means X and x are not considered the same value. This will be true when merging tables as well.

 

Luckily there is an option to ignore case when performing a fuzzy match merge. This is enabled by default.

 

Match by Combining Text Parts

There's an option to combine text parts to find matches. This means Power Query will treat things like "birthday" the same as "birth day". This is enabled by default.

 

Maximum Number of Matches

With fuzzy matching there is the potential to match items together that shouldn't be a match.

 

This option will help prevent unwanted matches by limiting the number of matches that are returned.

 

If we set this to 1, then Power Query will only return the best match and won't return the other matches that are still above the similarity threshold.

 

By default this is left blank and will allow up to 2,147,483,647 matches. 2 billion matches should be enough for anyone.

 

Transformation Table

Fuzzy matching has a very handy feature that allows us to set a transformation table.

 

This allows us to define pairs of values we consider the same and Power Query will treat them as equivalent during the fuzzy merge.

 

A transformation table can be created by creating a table with two column (a From and a To column).

 

05 Transformation Table.png

 

The above table will tell Power Query that Mobile and Cell should be considered the same thing when matching.

 

Expand Merged Data In The Power Query Editor

 

06 Expand Merged Table Column.png

 

When we press the OK button in the merge menu, this will open up the Power Query editor.

 

We will see our ListA Product data and a ListB column which contains a bunch of tables. These tables contain our merged data from ListB.

 

We need to expand out this ListB column to get the data. Click on the expand toggle in the right of the column heading and press the OK button.

 

07 Fuzzy Match Results.png

 

We can now Close & Load our results into a table in our workbook to get our matched results.

 

Conclusions

Matching data is a very common task.

 

Up until now, we only had ways to find exact matches.

 

Partial matching either required a lot of manual work or using wildcard characters if our data had predictable differences.

 

Now we have a quick and easy way to match data even when there isn't a common pattern to the differences.

 

What a time saver! I only wish I had this years ago.

 

 

8 Comments
Steel Contributor

Pro-Tip: Set your threshold to 1, but tell it to ignore case, and it will do an exact case-insensitive match. Saves having to always get your join columns in the same case. This will force IgnoreSpace to not be valid though if you set the threshold to 1.

Copper Contributor

@JohnMacDougall 

 

"What a time saver! I only wish I had this years ago."

You did have this in Excel since late 2010  as an Add-in called Fuzzy Lookup

https://www.microsoft.com/en-in/download/details.aspx?id=15011

 

Since then it has had exactly 1 Update in (2014)

This huge time save was neither promoted by Microsoft nor were efforts made to fully integrate it to Excel.

In PBI it came only 3 years (Oct 2018) after the "Idea" submitted on June 2015. In Excel it came even later.

 

Even in its current implementation there are some gaps. For E.g. in the Excel Addin we can have a column that gives a % indicating Jaccard's index of similarity - In PQ this column is missing. Before you suggest I have already submitted it as an Idea - But I am sure a few years from now it will get implemented

 

Also there is no way currently of clustering in to Fuzzy Groups or deleting Fuzzy Duplicates.

 

Conclusions:

MS will remain a company with unlimited potential and limited vision

 

Sam

 

 

 

 

 

Copper Contributor

Hello John,

 

I am using Excel 2016 version with the Fuzzy Lookup Add in already installed. For some reasons, I can not see the option "Use fuzzy matching to perform the merge" under the Join Kind"'s Merge field.

 

Please, let me know how I can enable this Fuzzy merging in the Powerquery.

 

Thank you,

 

Sala.

Steel Contributor

@SalaLoocki - this is not the old Fuzzy Merge addin. This is only in Office 365. If you have the perpetual licensed version of Excel 2016, you won't get it. It isn't even in Excel 2019. It will likely exist in Excel 2022 (or whenever they launch another perpetual license version), or you can get today with Office 365.

Copper Contributor

Very good. I used the earlier Fuzzy add in and was thinking about it the other day and then I came across this page today!

 

The Merge Menu Settings dialogue box is too big for my screen so I have to use the Tab key to get to and use the final two elections. That's a bit clunky and error prone, I think.

 

Transformation Table: I created on, similar to yours. I wanted & to change to and so I put & as From and and as To ... it rejected that out of hand! When I removed that row from the Transformation table, all was well except that when I said From MSFT To Microsoft, it did the opposite!

 

I support the idea of using a very simple example to demonstrate this feature so I will develop my own example and learn even more, I am sure.

Steel Contributor

It is really time saver...

Brass Contributor

I agree with @xlpbi . I just upgraded to Office 365 almost exclusively because Fuzzy Merge was built into Power Query, but quickly found that it is considerably inferior to the Fuzzy Lookup Add-in. In addition to what they mentioned about not having a column with % similarity (which is perhaps the most useful function of the whole thing), it is returning about 20% fewer matches with the same data sets and parameters. It is also AT LEAST 10x slower. Very disappointed - I'm still stuck with a hybrid of Queries and manual worksheet manipulation with the Fuzzy Lookup add-in...

 

Copper Contributor

Wonder how it works back -end, any algorithm ? 

Version history
Last update:
‎Apr 24 2020 02:04 PM
Updated by: