Jun 26 2023 10:50 AM
Hi there! I need some help with a large excel doc in office 365. It has 50,000 rows and 10 columns (the first two columns are first name and last name). The issue: names are appearing in duplicate, triplicate and more. What I need is to either move all unique row values to another worksheet OR delete all row entries that are not unique ones. Either way, i need the data in all 10 columns to remain with the row data. I've tried some "unique" formulas but none seem to result in the data in all 10 columns remaining. I can attach a sample excel with mock data if that helps. Thanks very much!
Jun 26 2023 11:00 AM - edited Jun 26 2023 11:00 AM
I can attach a sample excel with mock data if that helps.
It would help immeasurably. Attach here, or if that's not possible, put it on OneDrive or GoogleDrive with a link pasted here.
Jun 26 2023 12:19 PM
@mathetes thank you for your reply. Here is a link to a sample excel with mock data. Thanks!
Jun 27 2023 12:00 AM
Hi @tamlynn
Your link doesn't work and IMHO mock data are useless when it comes to identifying duplicates, especially with text values where the unexpected can be expected :))
With 50k rows attached is a Power Query option that does - on the first 2 columns:
- Clean as much as possible invisible/unprintable chars
- Remove extra spaces everywhere
- Capitalize 2nd column (if you want Last Name this can be changed easily)
NB
How to use:
- Download the attached file
- Copy/Paste as values your data in TableSource (sheet Data Source)
(you can rename the table column names the query will auto. adjust)
- Go to Excel Data tab > Refresh All
You get 2 results: Distincts & Duplicates
(In TableSource examples all colored cells have extra spaces here & there)
Jun 27 2023 05:53 AM