SOLVED

Pattern matching for User information rows to uniquify users

Copper Contributor

Hello there,

new to this so bear with me.

 

I got a big Excel sheet with about 5000 row entries and 10 columns of user data.

Things like, Name, Company Name, ID, Zip-Code, you name it, casual stuff.

 

Since this is user entry based there might be:

1. Spelling mistakes of the Company name

2. Different people use different shortcuts for company names

3. Some company names are used differently ( Dodger Highball Stadium for 1 person and the other enters Highball Stad Dod) for example.

 

Since i use this data in PowerBI i need the example from 3. to match each others name. 

So both are called the actual name, Dodger Highball Stadium to use the data further.

 

Is there a way with pattern matching or something like this to try to get things to work?

 

Much appreciated to any information which may lead me to a solution =) 

Thanks in advance!

3 Replies
best response confirmed by DanielWWW (Copper Contributor)
Solution

@DanielWWW Did some testing with Fuzzy Matching in Power Query. In order to match "Dodger Highball Stadium" with "Highball Stad Dod" you need to set the similarity threshold to 0.18. This means that you are going to end up with a lot of other matches as well. Entries like "Stadium Rd." or "Dodge Motor Company" or "Highba Inc.) will also be matched. Anything with large parts of either of the three words seems to be matched. Thus not very useful. 

So, you could perhaps use fuzzy matching with a higher threshold that will allow for small spelling errors. But to capture the more extreme variations, I fear you are going to need human brain power and some manual work to set up a transformation table with all possible references to one and the same location.

 

Attaching my test file. Perhaps useful.

@Riny_van_Eekelen 

Thank you Riny for the fast answer already.

I will try it out and test a little bit and will let you know.

=)

Hi @DanielWWW 

 

You can use the fuzzy match in Power Query for this issue, what I would suggest you use the FUZZY match with transformation table where you can input the value FROM and TO handle this scenario.

 

Regards, Faraz Shaikh | MCT, MIE, MOS Master, Excel Expert

If you find the above solution resolved your query don't forget mark as Official/Best Answer & like it to help the other members find it more.

1 best response

Accepted Solutions
best response confirmed by DanielWWW (Copper Contributor)
Solution

@DanielWWW Did some testing with Fuzzy Matching in Power Query. In order to match "Dodger Highball Stadium" with "Highball Stad Dod" you need to set the similarity threshold to 0.18. This means that you are going to end up with a lot of other matches as well. Entries like "Stadium Rd." or "Dodge Motor Company" or "Highba Inc.) will also be matched. Anything with large parts of either of the three words seems to be matched. Thus not very useful. 

So, you could perhaps use fuzzy matching with a higher threshold that will allow for small spelling errors. But to capture the more extreme variations, I fear you are going to need human brain power and some manual work to set up a transformation table with all possible references to one and the same location.

 

Attaching my test file. Perhaps useful.

View solution in original post