Imagine this. We have been given a list of products from company A and a list of products from company B.
Our job is to match products between the two lists. But each company might have slightly different names for the same item.
This means we can't just use a lookup function because we don't have exact matches. We can't even use wildcards in a lookup function because the product names aren't different in a predictable way.
This is where we realize we have to manually match our products from the two lists and eventually we give up and start to cry! Right?
No, because we now have a fuzzy match option in Power Query. Lazy people rejoice!
Not sure what Power Query is? Then check out this Introduction to Power Query beforehand.
In our example, we have List A and List B. These are small and only have the one column, but imagine we're working with two large datasets with hundreds or thousands or products and many other columns.
We can see that none of the items in each list is an exact match, but they are likely the same product.
For example, in list A we have "Macaroni and cheese", but in list B we have "Macaroni & Cheese". They are the same thing but slightly different.
Both these lists have been converted into Excel Tables named ListA and ListB. The first thing we will need to do is create a query for each table.
Once this is done for both tables, we'll be able to merge (join) the tables in another query.
Power Query has an option to merge tables.
This allows us to join data in two tables based on a common field like an ID column, or in our case the Product name column.
This is where we'll also find the new fuzzy match feature.
Create a Merge query.
This will open up the Merge query menu.
The Merge menu is where we can enable fuzzy matching.
Notice at the bottom of the merge menu, Power Query tells us how many matches are found. When we enable fuzzy matches, this goes from 0 of 8 to 2 of 8.
We've already improved our matching by just enabling the fuzzy matching option. But there are more settings we can use to help improve the matching.
Click on the collapsed Fuzzy matching options and more advanced settings will be revealed. These can help us match more items in our lists.
There's a similarity threshold we can adjust to help match more items. This is a value between 0 and 1 that indicates how similar values need to be in order to match.
The default value for this will be 0.8.
Decreasing the threshold to 0.5 will take us from 2 of 8 matches to 5 of 8 matches.
Decreasing the threshold further to 0.3 will get us all 8 items matching.
But remember, the lower the threshold, the higher the likelihood of getting unwanted matches.
Power Query is case sensitive. This means X and x are not considered the same value. This will be true when merging tables as well.
Luckily there is an option to ignore case when performing a fuzzy match merge. This is enabled by default.
There's an option to combine text parts to find matches. This means Power Query will treat things like "birthday" the same as "birth day". This is enabled by default.
With fuzzy matching there is the potential to match items together that shouldn't be a match.
This option will help prevent unwanted matches by limiting the number of matches that are returned.
If we set this to 1, then Power Query will only return the best match and won't return the other matches that are still above the similarity threshold.
By default this is left blank and will allow up to 2,147,483,647 matches. 2 billion matches should be enough for anyone.
Fuzzy matching has a very handy feature that allows us to set a transformation table.
This allows us to define pairs of values we consider the same and Power Query will treat them as equivalent during the fuzzy merge.
A transformation table can be created by creating a table with two column (a From and a To column).
The above table will tell Power Query that Mobile and Cell should be considered the same thing when matching.
When we press the OK button in the merge menu, this will open up the Power Query editor.
We will see our ListA Product data and a ListB column which contains a bunch of tables. These tables contain our merged data from ListB.
We need to expand out this ListB column to get the data. Click on the expand toggle in the right of the column heading and press the OK button.
We can now Close & Load our results into a table in our workbook to get our matched results.
Matching data is a very common task.
Up until now, we only had ways to find exact matches.
Partial matching either required a lot of manual work or using wildcard characters if our data had predictable differences.
Now we have a quick and easy way to match data even when there isn't a common pattern to the differences.
What a time saver! I only wish I had this years ago.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.