May 31 2023 05:37 AM
So I have an excel file with over 4000 entries. Each entry has an ID on the fourth column "D", and base on this column there are many entries with the same ID (aka duplicates). Each copy of duplicates may have columns with the same data and columns with different data (ex: some are empty cells in one copy, but filled in another copy).
So my objective is to merge each duplicated row (based on ID) without losing any data. I don't want to just delete the duplicates and for example keep the first copy of each set, because the second or third copy may have data or columns not filled in the first copy.
Also if it is the same exact data (ex: lets say in column labeled "degree" both copy 1 and 2 have data "Bachelor") only write it once, otherwise merge them with a comma or semicolon in between.
Is this at all possible to do without having to manually go through each entry?
Thanks
Jun 01 2023 10:21 AM - edited Jun 01 2023 09:21 PM
You probably have an option with Power Query - see attached sample that shouldn't be far from what you expect assuming I understood it all
Jun 01 2023 10:52 AM
When you have duplicate entries in a dataset and you want to merge them without losing any data, there are a few steps you can follow:
Identify the duplicate entries: Start by identifying the duplicate entries in your dataset. Look for fields or criteria that determine the duplication.
Determine the desired merged result: Decide how you want to merge the duplicate entries. Consider which fields should be prioritized or combined, and how the merged entry should look.
Create a new entry: Create a new entry in your dataset to hold the merged data. This entry will combine the information from the duplicate entries.
Merge the data: Copy the relevant data from each duplicate entry into the new merged entry. You may need to make decisions on how to handle conflicts or inconsistencies between the duplicate entries.
Preserve unique data: If there are fields or data unique to each duplicate entry, consider appending that data to the merged entry. You may choose to include it as separate fields or consolidate it in a meaningful way.
Update references: If other records or systems reference the duplicate entries, make sure to update those references to point to the new merged entry.
Delete duplicate entries: Once you have merged the data and verified the accuracy of the merged entry, you can safely delete the duplicate entries from your dataset.
It's important to exercise caution and create backups of your data before performing any merging or deletion operations. This ensures that you have a safety net in case any unexpected issues arise during the process.
Jun 02 2023 07:50 AM
Jun 02 2023 08:49 AM
How do I use it?
#1 Make sure you run Excel > 2013 on Windows
#2 Download the attached file
#3 Rename the columns in the blue Table (in sheet Input) as you want but column ID and add any additional columns you need
#4 Copy/paste your data in the blue Table
#5 Switch to sheet Output
#6 Right-click in the green Table > Refresh
Jun 02 2023 11:40 AM
Jun 03 2023 03:15 PM
Jun 04 2023 12:06 AM
No way to identify what causes the issue with an error message only. Please attach your file to your next reply. If you can't attach a file upload & share it with any file sharing service (OneDrive, Google Drive...) and post the shared link
Jun 04 2023 12:49 AM
@Lorenzo The thing is, the file contains sensitive information which I am not allowed to share.
If it is not possible to this without sharing, then I'll just have to do it manually, but thanks for trying.
Jun 04 2023 01:47 AM
Jun 04 2023 02:49 AM
I noticed 2-3 columns completely empty for some reason Many reasons can lead to this
I do understand the sensitivity of the data. On the other hand anonymizing (i.e. Names) shouldn't take too much time
Up to you...