Filtering unordered records

Silver Contributor

image.png

 

This post contains a number of solutions to a problem of sorting arbitrarily ordered colours to pick out distinct combinations.  I start with the formula as it should be, which sadly gives rise to the #CALC! 'Nested Arrays are not supported' error.  Since the solution to the problem is an array of arrays workarounds are required.

 

I have presented 4 possibilities.

1. Sort and combine each record using TEXTJOIN.  Use resulting string to return array from the original data;

2. Sort and combine each record into fixed length strings using Lambda functions.  Separate distinct combinations using the MID function acting of the array;

3. Sort and combine distinct records using REDUCE and VSTACK;

4. Sort (multiple times) and recombine distinct records using MAKEARRAY.

Which do you consider to be preferable?

Do you have other ideas?

14 Replies

@Peter Bartholomew 

That's interesting to play with it

sortRows=
LAMBDA( range,
    LET(
        topRow, SORT( CHOOSEROWS( range, 1),,, 1 ),
        IF( ROWS(range) = 1,
            topRow,
            VSTACK( topRow, sortRows( DROP( range, 1 )))
        )
));

distinctSortedRows=
LAMBDA( range,
    UNIQUE( sortRows( range )));

checkSameRows=
LAMBDA( horizontalVector, array,
    BYROW( array, LAMBDA( r, XOR( r = horizontalVector) )) );

getCountriesForColourCombination=
LAMBDA( horizontalVector,
    TEXTJOIN(",",1,
        IF( checkSameRows( horizontalVector, sortRows(colours)  ),
            country,""))
);

countriesByColourCombinations=
LET(
  distinctColours, distinctSortedRows( colours ),
  HSTACK( distinctColours,
    BYROW( distinctColours, getCountriesForColourCombination ) )
);

@Peter Bartholomew 

My first go at it I used CODE to identify where color combinations repeated.

Created a custom function to obtain the SUM of code by row:

 

=LAMBDA(row,SUM(CODE(MID(row,SEQUENCE(LEN(row)),1))))

 

 

Used the function in a formula to obtain the results. The formula is a bit lengthy and unkept at the moment, but it seems to work:

 

 

=LET(CodeTotals,BYROW(colours,LAMBDA(row,SumCode(TEXTJOIN("",,row)))),ObtainRows,XMATCH(UNIQUE(CodeTotals),CodeTotals),CHOOSEROWS(colours,ObtainRows))

 

The solution might be a bit more elegant, but I ran into the array of arrays limitation along the way where an XLOOKUP could've finished it off.

@Sergei Baklan 

Nice use of XOR, not seen that before...

 

@Peter Bartholomew 

I'd consider creating a generic version of BYROW that supports arrays

= LET(
   sorted, BYROWλ(colours,
      LAMBDA(r, SORT(r,,,1))),
   UNIQUE(sorted))

BYROWλ
=LAMBDA(array,function,
     REDUCE(1,
         SEQUENCE(ROWS(array)),
         LAMBDA(A,i,
              IF(SEQUENCE(i)=i,
                  function(INDEX(array,i,)),
                  A))))

And hope that BYROWλ could be replaced by the real thing in the not too distant future!

 

In beta version, another option could be to unpivot and sort by column,

=LET(
unpivot, HSTACK(TOCOL(colours),TOCOL(IF({1},country,colours))),
sorted,  WRAPROWS(TAKE(SORT(unpivot,{2,1}),,1),COLUMNS(colours)),
UNIQUE(sorted)
)

 

@lori_m 

SORT( array, {2,1} ) is where I stopped previous time, thank you.

universalBYROW() - yes, I don't thin it'll be something out of the box in nearest future.

@Patrick2788 

Summing character codes appears to work in this case but not sure it generalises
eg "ad" and "bc" could return the same result (197)

 

A related idea could be to assign binary values to each colour. 

i.e. black=1, blue=2, gold=4, green=8, red=16, white=32

 

=LET(
colour,SORT(UNIQUE(TOCOL(colours))),
code,2^SEQUENCE(ROWS(colour),,0),
HSTACK(colour,code))

 

 Summing colour codes by row could generate a sort key independent of order (assuming colours don't repeat)

Sound a bit like generating hash codes.

@lori_m 

Yes, it was in the back of my mind that it would be possible for 2 SUMS to be the same to lead to a false dupe.  It seems most solutions inevitably lead back to the same few workarounds (often involving TEXTJOIN).

 

A better approach I could go with:

 

=LET(joined,BYROW(colours,LAMBDA(row,TEXTJOIN("",,SORT(row,,,1)))),SelectRows,XMATCH(UNIQUE(joined),joined),CHOOSEROWS(colours,SelectRows))

 

Very similar to Peter's:

 

= LET(joined, BYROW(colours, LAMBDA(r,TEXTJOIN("|",,SORT(r,,,1)))),ptr, XMATCH(UNIQUE(joined),joined),INDEX(colours,ptr,{1,2,3}))

 

 

A bit of 'balloon folding' required with any of the solutions provided.

@Peter Bartholomew 

Revisting this task with MAKEARRAY.

 

It's a bit slow but we can't all be lion tamers:

 

'UniqueByRow
=LAMBDA(matrix,LET(
    height, ROWS(matrix),
    width, COLUMNS(matrix),
    Shuffle, LAMBDA(r, c,
        LET(Sorted, SORT(CHOOSEROWS(matrix, r), , , 1), val, INDEX(Sorted, , c), val)
    ),
    Results, MAKEARRAY(height, width, Shuffle),
    UNIQUE(Results)
))

@Patrick2788 

I like the definition of Shuffle prior to passing it as a parameter to MAKEARRAY (I am so conditioned to my own conventions that I had to rename it 'Shuffleλ').

 

As an aside, I downloaded a copy of my own problem definition file and reimplemented your formula in order to see it working step by step.  On opening the file, it showed

image.png

The banner was a surprise, I have never seen this before.

At first glance, it might appear to be a request to feed data to the recently deployed Co-Pilot, but I see someone on Mr. Excel's forum reported seeing it back in January of this year. Did you choose to 'Select Cell'?
I did wonder whether it was Co-Pilot related. The formula visible in the image was displayed as the result of clicking 'Select Cell'. What wasn't clear to me was whether I was being invited to provide further explanation or whether it is aimed at you and others to provide comments such as 'The author is well known for creating idiosyncratic formulas that are best ignored if one is to retain one's sanity'.

@Peter Bartholomew 

It looks like I've joined the club:

 

Patrick2788_0-1702146267148.png

 

It appears after submitting the formula it's then added as a private post in the feedback forum.

@Peter Bartholomew 

I'm going to drop another solution in the hopper:

 

 

UniqueX
=LAMBDA(matrix,LET(
    r, ROWS(matrix),
    c, COLUMNS(matrix),
    autonum, TOCOL(SEQUENCE(r) * SEQUENCE(, c, 1, 0)),
    vector, TOCOL(matrix),
    sorted_vector, DROP(SORT(HSTACK(autonum, vector), {1, 2}), , 1),
    wrapped, WRAPROWS(sorted_vector, c),
    UNIQUE(wrapped)
))

 

@Patrick2788 Nice one. I like your autonum formula. :)

 

Here's another slight variation using SORTBY, plus GROUPBY to output the final summary:

 

=LET(
    table, Table1[#All],
    country, TAKE(table,, 1),
    colors, DROP(table,, 1),
    cols, COLUMNS(colors),
    arr, TOCOL(colors),
    rowId, TOCOL(IF(SEQUENCE(, cols), SEQUENCE(ROWS(colors)))),
    sortRows, WRAPROWS(SORTBY(arr, rowId,, arr, ), cols),
    GROUPBY(sortRows, country, ARRAYTOTEXT, 3, 0)
)

 

The GROUPBY function is not yet available to me, so I was unable to test it fully. Without GROUPBY, another option could be:

 

=LET(
    table, Table1[#All],
    country, TAKE(table,, 1),
    colors, DROP(table,, 1),
    cols, COLUMNS(colors),
    arr, TOCOL(colors),
    rowId, TOCOL(IF(SEQUENCE(, cols), SEQUENCE(ROWS(colors)))),
    sortRows, WRAPROWS(SORTBY(arr, rowId,, arr, ), cols),
    joinCols, REDUCE(CHOOSECOLS(sortRows, 1), SEQUENCE(cols-1,, 2), LAMBDA(v,n, v&"|"&CHOOSECOLS(sortRows, n))),
    HSTACK(UNIQUE(sortRows), MAP(UNIQUE(joinCols), LAMBDA(v, ARRAYTOTEXT(FILTER(country, v=joinCols)))))
)