SOLVED

Remove Duplicates from multiple columns

Copper Contributor

I have data in which i have duplicates but in different columns for ex :

Coulmn AColumn B
12
21
34
43

but now my output should look like

Column AColumn B
12
34

how can i achieve this using Excel or Sql ?

1 Reply
best response confirmed by gouravsaini (Copper Contributor)
Solution

@gouravsaini 

This is one of the situations is made difficult by the fact that Excel array formulas throw a wobbly when faced with nested arrays.  Clearly one would like to sort each row and then extract the unique rows but that results in a completely unnecessary #CALC! error.  

One workaround is to concatenate the sorted rows and apply the UNIQUE function to the resulting strings.  I then search for the distinct strings within the full set to return the index of the first occurrence.  From there, back to the original data knowing the row numbers.

= LET(
     joined, BYROW(data, LAMBDA(r, TEXTJOIN("|",,SORT(r,,,1)))),
     ptr,    XMATCH(UNIQUE(joined),joined),
     INDEX(data,ptr,{1,2})
  )

The stated reason for the limitation is that "Excel has never supported arrays of arrays", but then, the support for arrays was somewhat dubious.  So, in order to ensure backward compatibility with legacy solutions, we make the present difficult and store up 'backward compatibility' problems for the future.

1 best response

Accepted Solutions
best response confirmed by gouravsaini (Copper Contributor)
Solution

@gouravsaini 

This is one of the situations is made difficult by the fact that Excel array formulas throw a wobbly when faced with nested arrays.  Clearly one would like to sort each row and then extract the unique rows but that results in a completely unnecessary #CALC! error.  

One workaround is to concatenate the sorted rows and apply the UNIQUE function to the resulting strings.  I then search for the distinct strings within the full set to return the index of the first occurrence.  From there, back to the original data knowing the row numbers.

= LET(
     joined, BYROW(data, LAMBDA(r, TEXTJOIN("|",,SORT(r,,,1)))),
     ptr,    XMATCH(UNIQUE(joined),joined),
     INDEX(data,ptr,{1,2})
  )

The stated reason for the limitation is that "Excel has never supported arrays of arrays", but then, the support for arrays was somewhat dubious.  So, in order to ensure backward compatibility with legacy solutions, we make the present difficult and store up 'backward compatibility' problems for the future.

View solution in original post