"Smart" de deuplication
I am working with a spreadsheet of around 100K lines. About 30K of them are actrually duplicated and need to be removed. Most people will say just use the deduplication feature and this is probably what I will do.
However for each duplicate one line has more information in it that the other so when I reduplicate I need to make sure the row with the extra information remains that one other row is removed. Luckily there is a column that has a value for the row I want to keep.
The below is a very simplfied version of the bigger table however you can see the DNS and IP columns are how I am finding the duplicates. Then I want to keep the row with more info or another way is keep the rows with "AGENT" in the Tracking column.
DC | DNS | Scan | IP | Country | ID | Tracking |
unknown | hostname1 | central | 192.168.0.1 | US | 86171 | AGENT |
local | hostname1 | local | 192.168.0.1 | 86171 | SCAN | |
unknown | hostname1 | central | 192.168.0.2 | US | 86172 | AGENT |
local | hostname1 | local | 192.168.0.2 | 86172 | SCAN | |
unknown | hostname1 | central | 192.168.0.3 | US | 86173 | AGENT |
local | hostname1 | local | 192.168.0.3 | 86173 | SCAN |
I do the reduplication using the DNS and IP columns but I don't see how I can tell it to keep the row I want. How does excel choose? Does it just keep the first row and delete the second for example?
Thanks.