SOLVED

Delete Duplicate Rows With Text Arranged in Different Columns

Brass Contributor

Hello, I have built 500 NBA fantasy team lineups composed of 8 players each.

 

Some players can be designated in multiple positions (different columns), and this leads to duplicate teams.

 

For example, on Rows 105:106 of the attached, the two teams are composed of identical players in different positions. I only want one lineup, not two lineups of the same players.

To clarify, Cell A105 contains "Malik Monk (13878917)" in the PG position, and Cell F106 contains "Malik Monk (13878917)" in the G position (Column headers in Row 1). 

As you can see, the team lineups in Row 105 and 106 contain identical players, but they are just arranged in different columns (A and F). I'd like to delete the duplicate row. 

To further clarify, I would like to delete all duplicates rows in this worksheet, not just adjacent rows, and to delete rows whether teams are duplicated in the same column or not. For example, Row 2 and Row 501 are identical and arranged in the same column as well, but only one should be kept.

My preference is to retain the first row. So in the above example with Malik Monk, I would prefer to keep Row 105 and delete 106.

Another preference is to delete duplicate rows, not just the data itself -- no empty rows between data.

 

Also, it's important to retain the order of the rows. Teams in the higher rows have higher projected points. For example, the team in Row 2 has higher projected points than Row 3. I'd like to retain the order.

In the attached file, there are many duplicate rows. So I would not be surprised if the 500 teams gets narrowed down to 200 or even 100. 

I very much appreciate the help!

11 Replies

@steveccc 

I understood the following:

1. You want to remove the rows which has same line up (Players in same position)

Use Concatenate function to combine all cells in the row. Once combined use conditional formating to find duplicate. It will help you find same player in same position entered twice or more.

Use remove duplicate to remove the duplicate one.

 

2. "For example, on Rows 105:106 of the attached, the two teams are composed of identical players in different positions. I only want one lineup, not two lineups of the same players."

Please explain?????

 

See Colown I of the sheet attached

 

 

 

 

@CA_PUNIT_AGARWAL Thanks CA_Punit. This is a clever solution. It works well when players are in the same column. But it does not work when they appear in different columns.

 

Look at the teams in Row 105 and 106 for example -- they are identical.

These two teams have the exact same players as one another -- but the players appear in a different order from one another (different column headers).

Malik Monk appears in A105 and F106.
James Harden appears in F105 and B106.
Malcolm Brogdon appears in B105 and A106. 


The same 8 players in Row 105 appear in Row 106. They are in just different order.

How can I filter out and delete Row 106, and others like it? Thanks!

best response confirmed by steveccc (Brass Contributor)
Solution

 @steveccc 

 

I have tried to solve your query:

Solution Steps:

 

1. Firstly i have assigned numbers to each unique person name (in Sheet 1)

2. I have used index and Match function to assign values (in sheet 1) to each cell

3. I add up those values. Now since every person is assigned different values so if sum of values are equal in different rows than they are identical (Persons in rows with same or different positions)

 

Please check it out

 

@CA_PUNIT_AGARWAL 

 

Works perfectly, thank you!

This is a very clever solution. I did not think about assigning numbers to text values. What I especially like about it is that it's simple enough for me to immediately understand and add to my skill set.  

@CA_PUNIT_AGARWAL  Hello, I've been testing the worksheet and ran into three issues.

1) False positives.
For example, in the attached updated spreadsheet, a lineup in Row 82 and a lineup in Row 145 are not identical, but they have the same number in Column Q.

I tried using the RAND function to make the numbers more uneven so less of a chance of matching, but I still get false positives.

2) Retain row structure.
I don't want to use the "Remove Duplicates" function because it deletes the Row. I actually just want to clear all the data in that row. Look at Row 3 and Row 10 for example. Row 10 contained the duplicate lineup, and I cleared that row manually. 

3) Speed. How do we do this quickly? I want to be able to filter and clear out these duplicate rows almost immediately. 

 

Thanks for your help!

@steveccc 

I'm not sure that works correctly. =1+2+7 and =1+3+6 gives the same sum, but their different number in each set. And here is sample from previous file if sort players ID:s which sum

image.png

 

I'd create helper range with names sorted alphabetically in each row, after that we may select them together and remove duplicates based on helper table, and like. 

In attached file it's from column I. To simplify formulas columns index is added in first row.

image.png

 In J2

=INDEX($A2:$H2,1,MATCH(J$1,COUNTIF($A2:$H2,"<="&$A2:$H2),0))

and drag it to the right and down. After that you may play with filters or physically Data->Remove Duplicates.

That could be bit easier with Power Query or Dynamic Arrays, but as I remember for this project are formulas only.

@Sergei Baklan 

Thanks Sergei. I want to retain the original order of the lineups, and not sure how to do that with the solution you just posted.

I took your worksheet from above, and concatenated the helper column data in Column R. Then I added a TRUE/FALSE helper column in Column S. (see attached).

I know how to use filter to delete all the matches indicated by TRUE in column S. However, I don't understand how I can use that to clear duplicate lineup data in Columns A:H.

I'm using the word "clear" intentionally. I don't want to delete the duplicate rows. I want to clear the data from the rows in order to retain the order in which the original data was inputted. Thanks for taking a look.

@steveccc 

To remove duplicates select data from column A to column R, unselect all, and select only R (or inwhich column you have duplicates).

image.png

Cleaning the data is not so easy with formulas. If only generate one more range into which by formulas copy information from initial one substituting it by empty texts for duplicated records.

 

However, Remove Duplicates shall not change the order of records, it keeps first met record and remove all duplicates which are after it.

S@Sergei Baklan 

 

Sir,

 

As I have selected the numbers serially so the formula has turned out to be incorrect.

So if we follow maths and create numbering in such a way numbers cannot add up in two or more variable

 

For Eg if we assign 1, 37,73 on a difference of 36 (36 because total no of unique name is 36) so number cannot add up if we select different variables.

 

I think it will work and solve the purpose in simple way.

@CA_PUNIT_AGARWAL 

If continue with numbers we have 1, 37, 73, 109. But with 1+109 = 37+73 we have different pairs of players.

 

Anyway, perhaps it could be the combination which works, but from my point of view that more complex than simple INDEX/MATCH. You have to extract unique names for each data set (we have no dynamic arrays here) or keep and update the list of all players in North America league with their own numbers.

 

I understand thanks. Was hoping there was an easy way to retain the original Row. So for example if we have this:
Row 1: Apple
Row 2: Apple
Row 3: Orange

After removing duplicates it would look like this:
Row 1: Apple
Row 2:
Row 3: Orange

That is, Row 2 would now be blank instead of deleted.
1 best response

Accepted Solutions
best response confirmed by steveccc (Brass Contributor)
Solution

 @steveccc 

 

I have tried to solve your query:

Solution Steps:

 

1. Firstly i have assigned numbers to each unique person name (in Sheet 1)

2. I have used index and Match function to assign values (in sheet 1) to each cell

3. I add up those values. Now since every person is assigned different values so if sum of values are equal in different rows than they are identical (Persons in rows with same or different positions)

 

Please check it out

 

View solution in original post