Merging duplicate entries without loosing data

Copper Contributor

So I have an excel file with over 4000 entries. Each entry has an ID on the fourth column "D", and base on this column there are many entries with the same ID (aka duplicates). Each copy of duplicates may have columns with the same data and columns with different data (ex: some are empty cells in one copy, but filled in another copy).
So my objective is to merge each duplicated row (based on ID) without losing any data. I don't want to just delete the duplicates and for example keep the first copy of each set, because the second or third copy may have data or columns not filled in the first copy.
Also if it is the same exact data (ex: lets say in column labeled "degree" both copy 1 and 2 have data "Bachelor") only write it once, otherwise merge them with a comma or semicolon in between.

Is this at all possible to do without having to manually go through each entry?

 

Thanks

10 Replies

@3zafandi 

 

You probably have an option with Power Query - see attached sample that shouldn't be far from what you expect assuming I understood it all

@3zafandi 

When you have duplicate entries in a dataset and you want to merge them without losing any data, there are a few steps you can follow:

  1. Identify the duplicate entries: Start by identifying the duplicate entries in your dataset. Look for fields or criteria that determine the duplication.

  2. Determine the desired merged result: Decide how you want to merge the duplicate entries. Consider which fields should be prioritized or combined, and how the merged entry should look.

  3. Create a new entry: Create a new entry in your dataset to hold the merged data. This entry will combine the information from the duplicate entries.

  4. Merge the data: Copy the relevant data from each duplicate entry into the new merged entry. You may need to make decisions on how to handle conflicts or inconsistencies between the duplicate entries.

  5. Preserve unique data: If there are fields or data unique to each duplicate entry, consider appending that data to the merged entry. You may choose to include it as separate fields or consolidate it in a meaningful way.

  6. Update references: If other records or systems reference the duplicate entries, make sure to update those references to point to the new merged entry.

  7. Delete duplicate entries: Once you have merged the data and verified the accuracy of the merged entry, you can safely delete the duplicate entries from your dataset.

It's important to exercise caution and create backups of your data before performing any merging or deletion operations. This ensures that you have a safety net in case any unexpected issues arise during the process.

Yes this exactly what I need but on a larger scale. How do I use it?

@3zafandi 

How do I use it?

#1 Make sure you run Excel > 2013 on Windows

#2 Download the attached file

#3 Rename the columns in the blue Table (in sheet Input) as you want but column ID and add any additional columns you need

#4 Copy/paste your data in the blue Table

#5 Switch to sheet Output

#6 Right-click in the green Table > Refresh

It gave me this error when I refreshed
[Expression.Error] An error occurred while accessing table Table1 because it contains overflow errors. Please fix the errors and try again.
If possible,upload your file which may be helpful for trouble shooting.

@3zafandi 

No way to identify what causes the issue with an error message only. Please attach your file to your next reply. If you can't attach a file upload & share it with any file sharing service (OneDrive, Google Drive...) and post the shared link

@Lorenzo The thing is, the file contains sensitive information which I am not allowed to share. 
If it is not possible to this without sharing, then I'll just have to do it manually, but thanks for trying.

Update: I tried the file again with only the first 1000 rows and it worked but there were some missing data. I noticed 2-3 columns completely empty for some reason.

@3zafandi 

I noticed 2-3 columns completely empty for some reason Many reasons can lead to this

I do understand the sensitivity of the data. On the other hand anonymizing (i.e. Names) shouldn't take too much time

Up to you...