Forum Discussion
People data lookups
This is a textbook entity matching problem. The below commenter suggests using COPILOT(), which you say you have a license for, but you can't actually do that in sheets on 20k rows, you'll get rate limited.
But the idea is right. Any fuzzy match on the four fields will have a lot of false positives and false negatives that will be very hard to spot or fix. You want LLMs to do the higher quality matching.
I would do this in three steps:
1. Standardize with simple formulas, e.g. trim spaces, normalize case, remove extra whitespace.
2. For entries WITH employee numbers, validate them against your master list. Flag any that don't match (typos in the ID itself).
3. For entries WITHOUT employee numbers or with invalid ones, you need intelligent matching, e.g. 'Bob' vs 'Robert').
So what to do for step 3? I recommend http://everyrow.io/merge. EveryRow uses LLMs under the hood to do a very high quality match. But it uses them efficiently, at about $0.01 per match (depends), so 20k rows could be done for ~$200 at very high quality. Hopefully it would be less, as steps 1 and 2 would reduce the size.
EveryRow also pretty easy to use. Just export from Excel as a CSV, import to everyrow, run the merge, export back to CSV, and import back to Excel. Takes just a minute, plus the time for the merge (maybe half an hour for 20k rows).
Good luck,
-Dan