Forum Discussion

Brirack77's avatar
Brirack77
Copper Contributor
Nov 04, 2025

People data lookups

Hi,

I have created an MS Form where anyone can nominate their peers for recognition - we have 20k employees. I've added a field in the form for a unique employee number for the nominator to include. So they would submit the nominee's: emp number, full name, role, area of the business. The output comes to me in excel and I need to update a list of 00's of employees, where they work, their contact details, line manager detail etc.

The challenge is this: whilst I can do all the look-ups to the external data to provide this, the output of the form could have numerous errors. First of all, a missing  emp number or incorrect emp no. Second of all, incorrect name spelling, or shortened names, or duplicate names, or too many spaces etc. Then there's incorrect role or missing role, and finally the same for area of the business. Using either multiple IFs or look-ups or Power Query, what's the best solution to pull through the correct emp number (from my external data), when those first four fields could be all wrong, partly wrong, missing or all correct, and what about dealing with duplicate names? I can concatenate if the four fields are correct, but this isn't reliable. Is there a fuzzy match that takes the best guess at the four fields completed - leaving me to just check a handful of #N/As? I have Copilot and licence too. Thanks!

3 Replies

  • dschwarz26's avatar
    dschwarz26
    Copper Contributor

    This is a textbook entity matching problem. The below commenter suggests using COPILOT(), which you say you have a license for, but you can't actually do that in sheets on 20k rows, you'll get rate limited.

    But the idea is right. Any fuzzy match on the four fields will have a lot of false positives and false negatives that will be very hard to spot or fix. You want LLMs to do the higher quality matching.

    I would do this in three steps:

    1. Standardize with simple formulas, e.g. trim spaces, normalize case, remove extra whitespace.

    2. For entries WITH employee numbers, validate them against your master list. Flag any that don't match (typos in the ID itself).

    3. For entries WITHOUT employee numbers or with invalid ones, you need intelligent matching, e.g. 'Bob' vs 'Robert').

    So what to do for step 3? I recommend http://everyrow.io/merge. EveryRow uses LLMs under the hood to do a very high quality match. But it uses them efficiently, at about $0.01 per match (depends), so 20k rows could be done for ~$200 at very high quality. Hopefully it would be less, as steps 1 and 2 would reduce the size.

    EveryRow also pretty easy to use. Just export from Excel as a CSV, import to everyrow, run the merge, export back to CSV, and import back to Excel. Takes just a minute, plus the time for the merge (maybe half an hour for 20k rows).

    Good luck,

    -Dan

     

  • Harun24HR's avatar
    Harun24HR
    Bronze Contributor

    Can you please attach a sample file or share via OneDrive, Google-Drive, Dropbox or similar service so that we can check your data structure and type and suggest best formula for you?

  • m_tarler's avatar
    m_tarler
    Bronze Contributor

    You can do various things like use String (and RegEx) functions to check if the strings match or if the first x characters match and such but because of names being things like Robert or Bob, that could be a problem.  You mentioned Copilot and that is sor tof the way I was thinking is you could ask AI to give you best match and then double check that list.  You could also do a hybrid approach where maybe you first check if maybe both the emplyee number and last name match and if so just use that info but if not then ask copilot to try a match.  I don't have and haven't used copilot but something like this should get you started:

    =IF(last_name = XLOOKUP(emp_id, db[ids], db[last], ""), emp_id, COPILOT("Find best match for Employee ID " & emp_id &" and last name " & last_name & " and first name " & first_name " in this data set", db) )

    where last_name, emp_id, and first_name are the entries on that line and the db is a table of all the emplyee information with db[ids] being the emplyee ID column and db[last] being the emplyee Last Name column.

Resources