Remove duplicates and keep based on other column values

Copper Contributor

Hello,

 

I've got a sheet that has addresses listed for people by unique ID. What I would like to end up with is a sheet that has each unique ID once, and has kept the "best" address for each person.

 

I've got a column of address types. Is there a way to tell Excel "remove duplicates keeping mailing, if no mailing keeping permeant, if no permeant keep business" or something of that nature?

 

I've also got an "address valid from" column, would it be possible to have it keep the most recent address for each unique ID?

 

To add some fun complexity, there are addresses shared by multiple unique IDs.

1 Reply

@isaac845 

 

Rather than removing duplicates, set up your data so that you can filter for "best".

 

You can add a column to indicate priority of address type. 

 

Suppose your data are formatted as a Table, and your address type column is address_type, then you could add a column called address_type_priority:

 

=SWITCH([@[address_type]],"Mailing",1,"Permanent",2,"Business",3,99)

 

This would give you a numeric priority for each address type. 

 

Then, you can add a column called is_priority, which compares the current row's address_type_priority with the minimum available for that unique ID:

 

=[@[address_type_priority]]=MINIFS([address_type_priority],[unique_id],[@[unique_id]])

This would return TRUE for those addresses which are the "best" according to the priorities you outlined and FALSE otherwise. So, if a unique ID has no mailing address but does have a permanent address and does have a business address, then the minimum priority would be 2 (for the permanent address) and this is_priority column would return TRUE for that row and FALSE for the business address row.

 

At this point, you can filter the table on the is_priority column to retrieve those rows which are best according to the priorities outlined. 

 

With regards to your point about "address valid from" column - do you want this to override the issue of priority or to be used after the priority? 

 

Either way, the principle could be the same, you could create a column called is_max_valid_from, with formula:

 

=[@[address valid from]]=MAXIFS([address valid from],[unique_id],[@[unique_id]])

 

You can then use some combination of the is_priority column and the is_max_valid_from column to get the rows you want.