Forum Discussion
PeterBartholomew1
Jul 09, 2022Silver Contributor
Filtering unordered records
This post contains a number of solutions to a problem of sorting arbitrarily ordered colours to pick out distinct combinations. I start with the formula as it should be, which sadly gives rise to the #CALC! 'Nested Arrays are not supported' error. Since the solution to the problem is an array of arrays workarounds are required.
I have presented 4 possibilities.
1. Sort and combine each record using TEXTJOIN. Use resulting string to return array from the original data;
2. Sort and combine each record into fixed length strings using Lambda functions. Separate distinct combinations using the MID function acting of the array;
3. Sort and combine distinct records using REDUCE and VSTACK;
4. Sort (multiple times) and recombine distinct records using MAKEARRAY.
Which do you consider to be preferable?
Do you have other ideas?
- Patrick2788Silver Contributor
I'm going to drop another solution in the hopper:
UniqueX =LAMBDA(matrix,LET( r, ROWS(matrix), c, COLUMNS(matrix), autonum, TOCOL(SEQUENCE(r) * SEQUENCE(, c, 1, 0)), vector, TOCOL(matrix), sorted_vector, DROP(SORT(HSTACK(autonum, vector), {1, 2}), , 1), wrapped, WRAPROWS(sorted_vector, c), UNIQUE(wrapped) ))
- djclementsBronze Contributor
Patrick2788 Nice one. I like your autonum formula. 🙂
Here's another slight variation using SORTBY, plus GROUPBY to output the final summary:
=LET( table, Table1[#All], country, TAKE(table,, 1), colors, DROP(table,, 1), cols, COLUMNS(colors), arr, TOCOL(colors), rowId, TOCOL(IF(SEQUENCE(, cols), SEQUENCE(ROWS(colors)))), sortRows, WRAPROWS(SORTBY(arr, rowId,, arr, ), cols), GROUPBY(sortRows, country, ARRAYTOTEXT, 3, 0) )
The GROUPBY function is not yet available to me, so I was unable to test it fully. Without GROUPBY, another option could be:
=LET( table, Table1[#All], country, TAKE(table,, 1), colors, DROP(table,, 1), cols, COLUMNS(colors), arr, TOCOL(colors), rowId, TOCOL(IF(SEQUENCE(, cols), SEQUENCE(ROWS(colors)))), sortRows, WRAPROWS(SORTBY(arr, rowId,, arr, ), cols), joinCols, REDUCE(CHOOSECOLS(sortRows, 1), SEQUENCE(cols-1,, 2), LAMBDA(v,n, v&"|"&CHOOSECOLS(sortRows, n))), HSTACK(UNIQUE(sortRows), MAP(UNIQUE(joinCols), LAMBDA(v, ARRAYTOTEXT(FILTER(country, v=joinCols))))) )
- Patrick2788Silver Contributor
Revisting this task with MAKEARRAY.
It's a bit slow but we can't all be lion tamers:
'UniqueByRow =LAMBDA(matrix,LET( height, ROWS(matrix), width, COLUMNS(matrix), Shuffle, LAMBDA(r, c, LET(Sorted, SORT(CHOOSEROWS(matrix, r), , , 1), val, INDEX(Sorted, , c), val) ), Results, MAKEARRAY(height, width, Shuffle), UNIQUE(Results) ))
- PeterBartholomew1Silver Contributor
I like the definition of Shuffle prior to passing it as a parameter to MAKEARRAY (I am so conditioned to my own conventions that I had to rename it 'Shuffleλ').
As an aside, I downloaded a copy of my own problem definition file and reimplemented your formula in order to see it working step by step. On opening the file, it showed
The banner was a surprise, I have never seen this before.
- Patrick2788Silver ContributorAt first glance, it might appear to be a request to feed data to the recently deployed Co-Pilot, but I see someone on Mr. Excel's forum reported seeing it back in January of this year. Did you choose to 'Select Cell'?
- lori_mSteel Contributor
Nice use of XOR, not seen that before...
I'd consider creating a generic version of BYROW that supports arrays
= LET( sorted, BYROWλ(colours, LAMBDA(r, SORT(r,,,1))), UNIQUE(sorted)) BYROWλ =LAMBDA(array,function, REDUCE(1, SEQUENCE(ROWS(array)), LAMBDA(A,i, IF(SEQUENCE(i)=i, function(INDEX(array,i,)), A))))
And hope that BYROWλ could be replaced by the real thing in the not too distant future!
In beta version, another option could be to unpivot and sort by column,
=LET( unpivot, HSTACK(TOCOL(colours),TOCOL(IF({1},country,colours))), sorted, WRAPROWS(TAKE(SORT(unpivot,{2,1}),,1),COLUMNS(colours)), UNIQUE(sorted) )
SORT( array, {2,1} ) is where I stopped previous time, thank you.
universalBYROW() - yes, I don't thin it'll be something out of the box in nearest future.
- Patrick2788Silver Contributor
My first go at it I used CODE to identify where color combinations repeated.
Created a custom function to obtain the SUM of code by row:
=LAMBDA(row,SUM(CODE(MID(row,SEQUENCE(LEN(row)),1))))
Used the function in a formula to obtain the results. The formula is a bit lengthy and unkept at the moment, but it seems to work:
=LET(CodeTotals,BYROW(colours,LAMBDA(row,SumCode(TEXTJOIN("",,row)))),ObtainRows,XMATCH(UNIQUE(CodeTotals),CodeTotals),CHOOSEROWS(colours,ObtainRows))
The solution might be a bit more elegant, but I ran into the array of arrays limitation along the way where an XLOOKUP could've finished it off.
- PeterBartholomew1Silver ContributorSound a bit like generating hash codes.
- lori_mSteel Contributor
Summing character codes appears to work in this case but not sure it generalises
eg "ad" and "bc" could return the same result (197)A related idea could be to assign binary values to each colour.
i.e. black=1, blue=2, gold=4, green=8, red=16, white=32
=LET( colour,SORT(UNIQUE(TOCOL(colours))), code,2^SEQUENCE(ROWS(colour),,0), HSTACK(colour,code))
Summing colour codes by row could generate a sort key independent of order (assuming colours don't repeat)
- Patrick2788Silver Contributor
Yes, it was in the back of my mind that it would be possible for 2 SUMS to be the same to lead to a false dupe. It seems most solutions inevitably lead back to the same few workarounds (often involving TEXTJOIN).
A better approach I could go with:
=LET(joined,BYROW(colours,LAMBDA(row,TEXTJOIN("",,SORT(row,,,1)))),SelectRows,XMATCH(UNIQUE(joined),joined),CHOOSEROWS(colours,SelectRows))
Very similar to Peter's:
= LET(joined, BYROW(colours, LAMBDA(r,TEXTJOIN("|",,SORT(r,,,1)))),ptr, XMATCH(UNIQUE(joined),joined),INDEX(colours,ptr,{1,2,3}))
A bit of 'balloon folding' required with any of the solutions provided.
That's interesting to play with it
sortRows= LAMBDA( range, LET( topRow, SORT( CHOOSEROWS( range, 1),,, 1 ), IF( ROWS(range) = 1, topRow, VSTACK( topRow, sortRows( DROP( range, 1 ))) ) )); distinctSortedRows= LAMBDA( range, UNIQUE( sortRows( range ))); checkSameRows= LAMBDA( horizontalVector, array, BYROW( array, LAMBDA( r, XOR( r = horizontalVector) )) ); getCountriesForColourCombination= LAMBDA( horizontalVector, TEXTJOIN(",",1, IF( checkSameRows( horizontalVector, sortRows(colours) ), country,"")) ); countriesByColourCombinations= LET( distinctColours, distinctSortedRows( colours ), HSTACK( distinctColours, BYROW( distinctColours, getCountriesForColourCombination ) ) );