Remove duplicates and keep based on other column values

Iron Contributor

Oct 04, 2022

Rather than removing duplicates, set up your data so that you can filter for "best".

You can add a column to indicate priority of address type.

Suppose your data are formatted as a Table, and your address type column is address_type, then you could add a column called address_type_priority:

=SWITCH([@[address_type]],"Mailing",1,"Permanent",2,"Business",3,99)

This would give you a numeric priority for each address type.

Then, you can add a column called is_priority, which compares the current row's address_type_priority with the minimum available for that unique ID:

=[@[address_type_priority]]=MINIFS([address_type_priority],[unique_id],[@[unique_id]])

This would return TRUE for those addresses which are the "best" according to the priorities you outlined and FALSE otherwise. So, if a unique ID has no mailing address but does have a permanent address and does have a business address, then the minimum priority would be 2 (for the permanent address) and this is_priority column would return TRUE for that row and FALSE for the business address row.

At this point, you can filter the table on the is_priority column to retrieve those rows which are best according to the priorities outlined.

With regards to your point about "address valid from" column - do you want this to override the issue of priority or to be used after the priority?

Either way, the principle could be the same, you could create a column called is_max_valid_from, with formula:

=[@[address valid from]]=MAXIFS([address valid from],[unique_id],[@[unique_id]])

You can then use some combination of the is_priority column and the is_max_valid_from column to get the rows you want.

Forum Discussion

Remove duplicates and keep based on other column values