Filtering unordered records

Trusted Contributor

image.png

 

This post contains a number of solutions to a problem of sorting arbitrarily ordered colours to pick out distinct combinations.  I start with the formula as it should be, which sadly gives rise to the #CALC! 'Nested Arrays are not supported' error.  Since the solution to the problem is an array of arrays workarounds are required.

 

I have presented 4 possibilities.

1. Sort and combine each record using TEXTJOIN.  Use resulting string to return array from the original data;

2. Sort and combine each record into fixed length strings using Lambda functions.  Separate distinct combinations using the MID function acting of the array;

3. Sort and combine distinct records using REDUCE and VSTACK;

4. Sort (multiple times) and recombine distinct records using MAKEARRAY.

Which do you consider to be preferable?

Do you have other ideas?

7 Replies

@Peter Bartholomew 

That's interesting to play with it

sortRows=
LAMBDA( range,
    LET(
        topRow, SORT( CHOOSEROWS( range, 1),,, 1 ),
        IF( ROWS(range) = 1,
            topRow,
            VSTACK( topRow, sortRows( DROP( range, 1 )))
        )
));

distinctSortedRows=
LAMBDA( range,
    UNIQUE( sortRows( range )));

checkSameRows=
LAMBDA( horizontalVector, array,
    BYROW( array, LAMBDA( r, XOR( r = horizontalVector) )) );

getCountriesForColourCombination=
LAMBDA( horizontalVector,
    TEXTJOIN(",",1,
        IF( checkSameRows( horizontalVector, sortRows(colours)  ),
            country,""))
);

countriesByColourCombinations=
LET(
  distinctColours, distinctSortedRows( colours ),
  HSTACK( distinctColours,
    BYROW( distinctColours, getCountriesForColourCombination ) )
);

@Peter Bartholomew 

My first go at it I used CODE to identify where color combinations repeated.

Created a custom function to obtain the SUM of code by row:

 

=LAMBDA(row,SUM(CODE(MID(row,SEQUENCE(LEN(row)),1))))

 

 

Used the function in a formula to obtain the results. The formula is a bit lengthy and unkept at the moment, but it seems to work:

 

 

=LET(CodeTotals,BYROW(colours,LAMBDA(row,SumCode(TEXTJOIN("",,row)))),ObtainRows,XMATCH(UNIQUE(CodeTotals),CodeTotals),CHOOSEROWS(colours,ObtainRows))

 

The solution might be a bit more elegant, but I ran into the array of arrays limitation along the way where an XLOOKUP could've finished it off.

@Sergei Baklan 

Nice use of XOR, not seen that before...

 

@Peter Bartholomew 

I'd consider creating a generic version of BYROW that supports arrays

= LET(
   sorted, BYROWλ(colours,
      LAMBDA(r, SORT(r,,,1))),
   UNIQUE(sorted))

BYROWλ
=LAMBDA(array,function,
     REDUCE(1,
         SEQUENCE(ROWS(array)),
         LAMBDA(A,i,
              IF(SEQUENCE(i)=i,
                  function(INDEX(array,i,)),
                  A))))

And hope that BYROWλ could be replaced by the real thing in the not too distant future!

 

In beta version, another option could be to unpivot and sort by column,

=LET(
unpivot, HSTACK(TOCOL(colours),TOCOL(IF({1},country,colours))),
sorted,  WRAPROWS(TAKE(SORT(unpivot,{2,1}),,1),COLUMNS(colours)),
UNIQUE(sorted)
)

 

@lori_m 

SORT( array, {2,1} ) is where I stopped previous time, thank you.

universalBYROW() - yes, I don't thin it'll be something out of the box in nearest future.

@Patrick2788 

Summing character codes appears to work in this case but not sure it generalises
eg "ad" and "bc" could return the same result (197)

 

A related idea could be to assign binary values to each colour. 

i.e. black=1, blue=2, gold=4, green=8, red=16, white=32

 

=LET(
colour,SORT(UNIQUE(TOCOL(colours))),
code,2^SEQUENCE(ROWS(colour),,0),
HSTACK(colour,code))

 

 Summing colour codes by row could generate a sort key independent of order (assuming colours don't repeat)

Sound a bit like generating hash codes.

@lori_m 

Yes, it was in the back of my mind that it would be possible for 2 SUMS to be the same to lead to a false dupe.  It seems most solutions inevitably lead back to the same few workarounds (often involving TEXTJOIN).

 

A better approach I could go with:

 

=LET(joined,BYROW(colours,LAMBDA(row,TEXTJOIN("",,SORT(row,,,1)))),SelectRows,XMATCH(UNIQUE(joined),joined),CHOOSEROWS(colours,SelectRows))

 

Very similar to Peter's:

 

= LET(joined, BYROW(colours, LAMBDA(r,TEXTJOIN("|",,SORT(r,,,1)))),ptr, XMATCH(UNIQUE(joined),joined),INDEX(colours,ptr,{1,2,3}))

 

 

A bit of 'balloon folding' required with any of the solutions provided.