Forum Discussion

PeterBartholomew1's avatar
PeterBartholomew1
Silver Contributor
Jul 09, 2022

Filtering unordered records

 

This post contains a number of solutions to a problem of sorting arbitrarily ordered colours to pick out distinct combinations.  I start with the formula as it should be, which sadly gives rise to the #CALC! 'Nested Arrays are not supported' error.  Since the solution to the problem is an array of arrays workarounds are required.

 

I have presented 4 possibilities.

1. Sort and combine each record using TEXTJOIN.  Use resulting string to return array from the original data;

2. Sort and combine each record into fixed length strings using Lambda functions.  Separate distinct combinations using the MID function acting of the array;

3. Sort and combine distinct records using REDUCE and VSTACK;

4. Sort (multiple times) and recombine distinct records using MAKEARRAY.

Which do you consider to be preferable?

Do you have other ideas?

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    PeterBartholomew1 

    I'm going to drop another solution in the hopper:

     

     

    UniqueX
    =LAMBDA(matrix,LET(
        r, ROWS(matrix),
        c, COLUMNS(matrix),
        autonum, TOCOL(SEQUENCE(r) * SEQUENCE(, c, 1, 0)),
        vector, TOCOL(matrix),
        sorted_vector, DROP(SORT(HSTACK(autonum, vector), {1, 2}), , 1),
        wrapped, WRAPROWS(sorted_vector, c),
        UNIQUE(wrapped)
    ))

     

    • djclements's avatar
      djclements
      Bronze Contributor

      Patrick2788 Nice one. I like your autonum formula. 🙂

       

      Here's another slight variation using SORTBY, plus GROUPBY to output the final summary:

       

      =LET(
          table, Table1[#All],
          country, TAKE(table,, 1),
          colors, DROP(table,, 1),
          cols, COLUMNS(colors),
          arr, TOCOL(colors),
          rowId, TOCOL(IF(SEQUENCE(, cols), SEQUENCE(ROWS(colors)))),
          sortRows, WRAPROWS(SORTBY(arr, rowId,, arr, ), cols),
          GROUPBY(sortRows, country, ARRAYTOTEXT, 3, 0)
      )

       

      The GROUPBY function is not yet available to me, so I was unable to test it fully. Without GROUPBY, another option could be:

       

      =LET(
          table, Table1[#All],
          country, TAKE(table,, 1),
          colors, DROP(table,, 1),
          cols, COLUMNS(colors),
          arr, TOCOL(colors),
          rowId, TOCOL(IF(SEQUENCE(, cols), SEQUENCE(ROWS(colors)))),
          sortRows, WRAPROWS(SORTBY(arr, rowId,, arr, ), cols),
          joinCols, REDUCE(CHOOSECOLS(sortRows, 1), SEQUENCE(cols-1,, 2), LAMBDA(v,n, v&"|"&CHOOSECOLS(sortRows, n))),
          HSTACK(UNIQUE(sortRows), MAP(UNIQUE(joinCols), LAMBDA(v, ARRAYTOTEXT(FILTER(country, v=joinCols)))))
      )
  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    PeterBartholomew1 

    Revisting this task with MAKEARRAY.

     

    It's a bit slow but we can't all be lion tamers:

     

    'UniqueByRow
    =LAMBDA(matrix,LET(
        height, ROWS(matrix),
        width, COLUMNS(matrix),
        Shuffle, LAMBDA(r, c,
            LET(Sorted, SORT(CHOOSEROWS(matrix, r), , , 1), val, INDEX(Sorted, , c), val)
        ),
        Results, MAKEARRAY(height, width, Shuffle),
        UNIQUE(Results)
    ))
    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      Patrick2788 

      I like the definition of Shuffle prior to passing it as a parameter to MAKEARRAY (I am so conditioned to my own conventions that I had to rename it 'Shuffleλ').

       

      As an aside, I downloaded a copy of my own problem definition file and reimplemented your formula in order to see it working step by step.  On opening the file, it showed

      The banner was a surprise, I have never seen this before.

      • Patrick2788's avatar
        Patrick2788
        Silver Contributor
        At first glance, it might appear to be a request to feed data to the recently deployed Co-Pilot, but I see someone on Mr. Excel's forum reported seeing it back in January of this year. Did you choose to 'Select Cell'?
  • lori_m's avatar
    lori_m
    Steel Contributor

    SergeiBaklan 

    Nice use of XOR, not seen that before...

     

    PeterBartholomew1 

    I'd consider creating a generic version of BYROW that supports arrays

    = LET(
       sorted, BYROWλ(colours,
          LAMBDA(r, SORT(r,,,1))),
       UNIQUE(sorted))
    
    BYROWλ
    =LAMBDA(array,function,
         REDUCE(1,
             SEQUENCE(ROWS(array)),
             LAMBDA(A,i,
                  IF(SEQUENCE(i)=i,
                      function(INDEX(array,i,)),
                      A))))

    And hope that BYROWλ could be replaced by the real thing in the not too distant future!

     

    In beta version, another option could be to unpivot and sort by column,

    =LET(
    unpivot, HSTACK(TOCOL(colours),TOCOL(IF({1},country,colours))),
    sorted,  WRAPROWS(TAKE(SORT(unpivot,{2,1}),,1),COLUMNS(colours)),
    UNIQUE(sorted)
    )

     

    • lori_m 

      SORT( array, {2,1} ) is where I stopped previous time, thank you.

      universalBYROW() - yes, I don't thin it'll be something out of the box in nearest future.

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    PeterBartholomew1 

    My first go at it I used CODE to identify where color combinations repeated.

    Created a custom function to obtain the SUM of code by row:

     

    =LAMBDA(row,SUM(CODE(MID(row,SEQUENCE(LEN(row)),1))))

     

     

    Used the function in a formula to obtain the results. The formula is a bit lengthy and unkept at the moment, but it seems to work:

     

     

    =LET(CodeTotals,BYROW(colours,LAMBDA(row,SumCode(TEXTJOIN("",,row)))),ObtainRows,XMATCH(UNIQUE(CodeTotals),CodeTotals),CHOOSEROWS(colours,ObtainRows))

     

    The solution might be a bit more elegant, but I ran into the array of arrays limitation along the way where an XLOOKUP could've finished it off.

    • lori_m's avatar
      lori_m
      Steel Contributor

      Patrick2788 

      Summing character codes appears to work in this case but not sure it generalises
      eg "ad" and "bc" could return the same result (197)

       

      A related idea could be to assign binary values to each colour. 

      i.e. black=1, blue=2, gold=4, green=8, red=16, white=32

       

      =LET(
      colour,SORT(UNIQUE(TOCOL(colours))),
      code,2^SEQUENCE(ROWS(colour),,0),
      HSTACK(colour,code))

       

       Summing colour codes by row could generate a sort key independent of order (assuming colours don't repeat)

      • Patrick2788's avatar
        Patrick2788
        Silver Contributor

        lori_m 

        Yes, it was in the back of my mind that it would be possible for 2 SUMS to be the same to lead to a false dupe.  It seems most solutions inevitably lead back to the same few workarounds (often involving TEXTJOIN).

         

        A better approach I could go with:

         

        =LET(joined,BYROW(colours,LAMBDA(row,TEXTJOIN("",,SORT(row,,,1)))),SelectRows,XMATCH(UNIQUE(joined),joined),CHOOSEROWS(colours,SelectRows))

         

        Very similar to Peter's:

         

        = LET(joined, BYROW(colours, LAMBDA(r,TEXTJOIN("|",,SORT(r,,,1)))),ptr, XMATCH(UNIQUE(joined),joined),INDEX(colours,ptr,{1,2,3}))

         

         

        A bit of 'balloon folding' required with any of the solutions provided.

  • PeterBartholomew1 

    That's interesting to play with it

    sortRows=
    LAMBDA( range,
        LET(
            topRow, SORT( CHOOSEROWS( range, 1),,, 1 ),
            IF( ROWS(range) = 1,
                topRow,
                VSTACK( topRow, sortRows( DROP( range, 1 )))
            )
    ));
    
    distinctSortedRows=
    LAMBDA( range,
        UNIQUE( sortRows( range )));
    
    checkSameRows=
    LAMBDA( horizontalVector, array,
        BYROW( array, LAMBDA( r, XOR( r = horizontalVector) )) );
    
    getCountriesForColourCombination=
    LAMBDA( horizontalVector,
        TEXTJOIN(",",1,
            IF( checkSameRows( horizontalVector, sortRows(colours)  ),
                country,""))
    );
    
    countriesByColourCombinations=
    LET(
      distinctColours, distinctSortedRows( colours ),
      HSTACK( distinctColours,
        BYROW( distinctColours, getCountriesForColourCombination ) )
    );

Resources