SOLVED

Finding duplicates in a large excel sheet

Copper Contributor

I have a very large spreadsheet of approximately 316,000 rows. I have used the conditional formatting tool to help me find duplicates. My next step is to use the find tool and search for the duplicate records based on conditional formatting of the cell. It seems as if the duplicate tool has overlaid the cell formatting and not applied the change to the cell therefore the find tool cannot be used to locate the duplicate records. I have verified this by looking at this cell format and the text that supplied by the duplicate tool does not show up and is therefore just overlaid on the cell and not applied to it. My question is twofold one is there a way to make the overlay permanent? And secondly is there a better way to Search for duplicates in my large spreadsheet.

I appreciate any assistance.

16 Replies

@GMKubala 

After applying conditional formatting to highlight duplicates, you can turn on Filter, then select Filter by Color to select only the duplicates.

@Hans Vogelaar I tried that in the formatting from the duplicate tool does not apply to the cells it basically overlays the color onto the cell.  If I manually apply the color to the cell then the find tool works only on the one that I manually apply the color. The find tool skips over any color that is applied by the find duplicate tool.  I hope that makes sense.

You just want to highlight the duplicated rows or remove them.

To remove the duplicated rows, from Data tab --> Data tools --> click Removed duplicates.
To just find the duplicated rows, you can used vlookup function

@Jihad Al-Jarady  I can't remove the duplicate rows, they provide necessary information that will be used to review another application.  My goal is to quickly find them.

@GMKubala 

I agree with using the conditional formatting to highlight all duplicates. From there you can create a new Helper column and input a 1 for all rows that show up in the filter to make it easier to identify all rows that are duplicates.

 

1. Conditional Format

adversi_0-1607541187931.png

 

2. Filter by Color

adversi_1-1607541238237.png

 

3. Helper Column

adversi_2-1607541267941.png

 

@adversi Thanks I'll give that a try.  That will, for sure solve my issue.

@adversi How did you get the "1" in the duplicate column?

@adversi Disregard my last message I see what you did. 

best response confirmed by GMKubala (Copper Contributor)
Solution

@GMKubala

Manual input once you filter by color, you copy and paste to the last row (instead of dragging). 

Then you can Clear the Conditional Formatting to revert the file size back to normal

 

adversi_0-1607543850128.png

 

@adversi I can't thank you enough, this helped me tremendously.   When I turned on the color filter, it took literally 30 minutes for the 316,000 line spreadsheet to display.  

@adversi @GMKubala 

Hi there, can you explain this a little better? I have 109,749 rows of a spreadsheet that I need to check/view duplicates in order to correct them.

@rkozak Can you tell me what step you're lost in? The breakdown before with screenshots should cover all the steps

I believe I understand what you're doing; you click the dropdown box where you can select variables to be included in your filter/sort for that given column. It is there that I can see filter by colour. I have tried this on another smaller list and it works easily.

Unfortunately, due to the sheer size of the file my computer always freezes and is unable to process even opening the drop down box because we have 109,749 unique values in that column. These are where the duplicate values are the utmost of importance because they actually correspond to serial numbers we intend to use.

@rkozak an alternative to avoid filtering and coloring can be to use the COUNTIF function

=COUNTIF(A:A,A10)-1

 

Any figure in the helper column that is greater than 0 has a duplicate.

adversi_0-1618530413742.png

 

One other way to get results without watching Excel spin for 20 minutes: You can sort the sheet by that column before doing anything else, and use a pretty simple If formula to check the values above and below to flag duplicates.

=IF(OR(D2=D1,D2=D3),"Y","")

 

hookylife_0-1644965837220.png

 

@GMKubala have you solved your problem. Do we have about 100k rows belonging to the  CPS tester

how do I solve this problem? please also share with me the method which solved your problem.

1 best response

Accepted Solutions
best response confirmed by GMKubala (Copper Contributor)
Solution

@GMKubala

Manual input once you filter by color, you copy and paste to the last row (instead of dragging). 

Then you can Clear the Conditional Formatting to revert the file size back to normal

 

adversi_0-1607543850128.png

 

View solution in original post