Forum Discussion

Patrick2788's avatar
Patrick2788
Silver Contributor
Oct 28, 2022
Solved

LAMBDA Halloween Challenge

PeterBartholomew1 SergeiBaklan mtarler 

Goal: generate an array of all possible combinations (w/repetitions) from a list of items.

 

For example, if there are 6 total items with 4 chosen (allowing for repeats), there are 126 possible combinations as evident from:

 

 

=COMBINA(6,4)

 

 

 

The challenge is to generate an array of all these possible combinations.  My solution is in the attached workbook, but I won't explain it yet, so I don't interfere with anyone's creative process.  The most combinations I'm able to generate is 3,003. There seems to be a limit or two preventing me from scaling this setup even bigger.

 

 

  • Patrick2788 

    Finally, I have a solution!

    This time I used a Lambda function to add 1 to the previous index (stored in base n) but before returning the value, I scanned the digits and copied the larger digit across.

    WorksheetFormula
    = LET(
          count, COUNTA(Items),
          initialise, REPT("0",pick),
          indices, VSTACK(
                   initialise, 
                   SCAN(initialise, SEQUENCE(combinations-1), NextCombinationλ(count,pick))
                   ),
          INDEX(Items, 1 + Explodeλ(indices))
       )

    where

    NextCombinationλ
    = LAMBDA(n,p, 
         LAMBDA(initial,s,
            LET(
                place,     n^SEQUENCE(1,p,p-1,-1),
                increment, 1 + SUM(place * Explodeλ(initial)),
                converted, BASE(increment,n,p),
                vector,    Explodeλ(converted),
                modified,  SCAN( , vector, Maxλ),
                CONCAT(modified)
            )
         )
      );

    supported by

    Explodeλ = LAMBDA(term, MID(term, SEQUENCE(1,pick), 1));
    
    Maxλ = LAMBDA(u, v, Max(u,v));

    I have just performed the calculation for 8 picks taken for 9 items in 300ms.  Ther are 12,870 combinations and the permutations would have been over 43M!

     

22 Replies

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    This is a novice's attempt at completing this challenge with Python in Excel.

     

    import itertools as it
    
    items = np.array(xl("Items")).tolist()    
    k = xl("pick")
    
    combinations = list(it.combinations_with_replacement(items,k))
    
    flattened = np.array(combinations).flatten()
    l = len(flattened)//k
    
    np.array(flattened).reshape((l,k))

     SergeiBaklan 

    I'm interested in any improvements that can be made to this approach.

    • SergeiBaklan's avatar
      SergeiBaklan
      Diamond Contributor

      Patrick2788 , I'm not a pythonist and a not person who is able to improve python codes. However, perhaps bit shorter will be

      import itertools as it
      
      items        = xl("Items").to_numpy().tolist()
      pick         = xl("pick")
      combinations = list(it.combinations_with_replacement(items,pick) )
      
      np.vstack(combinations).reshape(-1,pick)

      not sure about performance.

      • Patrick2788's avatar
        Patrick2788
        Silver Contributor
        That's an interesting approach. The combinations part (or permutations) is straight forward. It becomes a matter of how to best unpack the list because Excel does not like nested arrays! This gives me something to experiment with. Thank you for the reply!
  • davidleal's avatar
    davidleal
    Iron Contributor

    Patrick2788 You can try the following approach:

     

    =LET(COMBINA_SETS, LAMBDA(x,n,LET(
     NEXT_SET, LAMBDA(ME,x,m,n,i,IF(i=0,x,LET(s,SEQUENCE(,n),y,x+1,
       idx, XMATCH(m,IF(i < n,IF(s>i,m+1,y),y),-1,-1),
       ME(ME,IF(s=i,INDEX(y,idx),x),m,n,IF(idx=i,0,i-1))))),
     y,TOCOL(x),m,ROWS(y),IF(AND(n=1,m=1),x, LET(cnts, 
      SEQUENCE(COMBINA(m,n)-1),idx,REDUCE(SEQUENCE(,n,1,0),cnts,LAMBDA(ac,i,
      VSTACK(ac,NEXT_SET(NEXT_SET,IF(i=1,ac,TAKE(ac,-1)),m,n,n)))),
      INDEX(y,idx))))), 
      COMBINA_SETS(B1:B6,D1))

     

    It uses a user LAMBDA function COMBINA_SETS, which relays on a recursive user LAMBDA function NEXT_SET. To avoid creating a function in the Name Manager, I used the following trick: https://www.sumproduct.com/news/article/lambda-formulaic-recursion-its-all-about-me but you can remove the ME argument and define it in the Name Manager if you want (this a trick to use a recursive function inside of LET). I prefer this way because the sole purpose is to be used in COMBINA_SETS. If you enter a given set of values it returns the next one. The rest is just to use REDUCE to generate all the combinations based on NEXT_SET. It doesn't rely on Excel binary functions, since all of them have limitations on the number of digits of the binary number (DEC2BIN and BASE).

     

    As per the shared link: The current operand stack limit in Excel is 1,024.  This should be borne in mind together with calculation times, as the current recursion limit is set as 1,024 divided by (number of lambda parameters + 1)

     

    Better to reduce the input argument list in NEXT_SET, so it can be formulated like this:

     

    =LET(COMBINA_SETS, LAMBDA(x,n,LET(y,TOCOL(x), m,ROWS(y),
     NEXT_SET, LAMBDA(ME,x,i,IF(i=0,x,LET(s,SEQUENCE(,n),y, x+1,
      idx,XMATCH(m,IF(i<n,IF(s>i,m+1,y),y),-1,-1),
      ME(ME,IF(s=i,INDEX(y,idx),x),IF(idx=i,0,i-1))))),
     IF(AND(n=1,m=1),x,LET(cnts,SEQUENCE(COMBINA(m,n)-1), 
      idx,REDUCE(SEQUENCE(,n,1,0),cnts,LAMBDA(ac,i,
      VSTACK(ac,NEXT_SET(NEXT_SET,IF(i=1,ac,TAKE(ac,-1)),n)))),
      INDEX(y,idx))))),
     COMBINA_SETS(B1:B6,D1))

     

    so the limit would be 256 digits (max pick) in the worst-case scenario. Which is bigger than the BASE function limit of 53 digits. 

     

    For example, for the input data: {1,2,3,4,5,6} to find the combinations of three elements 

     

    1   1   1
    1   1   2 <- NEXT_SET(NEXT_SET,A1:C1,6,3,3)
    1   1   3 <- NEXT_SET(NEXT_SET,A2:C2,6,3,3)
    1   1   4 <- NEXT_SET(NEXT_SET,A3:C3,6,3,3)
    1   1   5 <- NEXT_SET(NEXT_SET,A4:C4,6,3,3)
    1   1   6 <- NEXT_SET(NEXT_SET,A5:C5,6,3,3)
    1   2   2 <- NEXT_SET(NEXT_SET,A6:C6,6,3,3)
    ...
    6   6   6 <- NEXT_SET(NEXT_SET,A56:C56,6,3,3)

     

    The algorithm goes backward, from right to left doing the corresponding increments, if the maximum value is reached, then it goes to the next digit recursively doing the same. It follows the same intuitive process a human will do to generate all combinations.

     

    Here is the output:

    Not showing the entire list on the screen.

     

    This solution doesn't generate a set that then requires removing duplicates, etc. it generates exactly the combinations required. It considers also special cases such as m=n, where we don't need a recursion process. It also works in case the input array is column-wise instead of row-wise.

     

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    Patrick2788 

     

    I 'cleaned up' my original solution to remove SEARCH:

     

    =WRAPROWS(TEXTSPLIT(TEXTJOIN("|",,DROP(SORT(UNIQUE(MAP(Arr,LAMBDA(e,IF(OR(MIN(Burst(e))=0,MAX(Burst(e))>COUNTA(Items)),"",TEXTJOIN("|",,SORT(Burst(e),,,1))))))),1)),,"|"),pick)*1

     Burst (I like 'Explode' better but I feel I'd be stealing Peter's name) being:

    =LAMBDA(scalar,1*MID(scalar,SEQUENCE(,LEN(scalar)),1))

     

    It caps out at 9 items, pick 5.  It doesn't matter what I use on the final matrix to convert numbers to names: MAP/XLOOKUP, MAP/CHOOSECOLS, SCAN/CHOOSECOLS, etc. Won't go beyond 9/5 because it has to look up each element.

     

    As aside to this project.  Wouldn't it be nice if I could do this ('Explode' a scalar):

    With this:

    =TEXTSPLIT(D1,"")

    The above produces a #VALUE! error.  Being able to explode a scalar with TEXTSPLIT would be really convenient.

     

     

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      Patrick2788 

      By all means use Explode!  I think I adopted from a video demonstration by Jack Williams of Microsoft Research, so I have no claim to the name!

       

      I agree that the 'null separator' should be interpreted as 'split at every character boundary'.  Not that it would do much good at present due to the fact that 

      = TEXTSPLIT(list,",")

      only returns the first term of each member of the list.  The justification is that Excel has never handled nested arrays so, for the sake of backward compatibility, it should do the same now!  I have never subscribed to the view that it is a good idea to compromise one's future to maintain backward compatibility to a steaming pile of junk!  In any case

      = MID(list,{1,3,5,7},1)

      always has worked.  Although

      = TEXTSPLIT(
            TEXTJOIN(";",,list),
            ",",";"
        )

      works, one runs into parameter character size problems, as I believe you have demonstrated.

       

  • Patrick2788 mtarler 

    It has just occurred to me that I stopped my 'Lambda-isation' one step short of a 'perfect' solution.

    The principle (which might be better discussed on JKPieterse 's Lambda discussion LI site) is

    "Every spreadsheet formula should be expressed as a Lambda function"! 

    The name chosen for the function should provide a succinct statement of what the function does, and the parameters provide an inclusive list of the formula's immediate precedents.

     

    There are assumptions implicit in this statement.  For example, it assumes that the user needs a clear idea of what the formula does but has no need to understand how it does it (the developer and auditor's needs are somewhat different and in the world of spreadsheets the user often adopts multiple roles).

     

    WorksheetFormula
    = CombinationsAλ(Items, pick)
    
    CombinationsAλ
    = LAMBDA(Items, p,
        LET(
            count,       COUNTA(Items),
            initialise,  REPT("0",p),
            counter,     SEQUENCE(COMBINA(count,pick)-1),
            indices,     VSTACK(initialise, 
                         SCAN(initialise, counter, NextCombinationλ(count,p))),
            INDEX(Items, 1 + Explodeλ(indices))
        )
     );

     

    Matt.  I tried removing 'initialise', VSTACK and the use of initialise within SCAN but, this far, without achieving the perfect result!

     

    Patrick. I have included a formula for your occurrence matrix, but the nested array problem makes it ugly.  I should rework it as a Scanλ(…, Countifsλ) that uses REDUCE and HSTACK to emulate the intended SCAN while passing the COUNTIFS as a Lambda variable.  I have done it before but none of this is a trivial exercise and it is annoying that such workarounds are needed.

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      Sorry

      = MAKEARRAY(COUNTA(Items),pick,
            LAMBDA(r,c, 
                COUNTIFS(INDEX(result#,,c), INDEX(Items,r))
            )
        )

      would have been better in this case.

  • Patrick2788 

    Finally, I have a solution!

    This time I used a Lambda function to add 1 to the previous index (stored in base n) but before returning the value, I scanned the digits and copied the larger digit across.

    WorksheetFormula
    = LET(
          count, COUNTA(Items),
          initialise, REPT("0",pick),
          indices, VSTACK(
                   initialise, 
                   SCAN(initialise, SEQUENCE(combinations-1), NextCombinationλ(count,pick))
                   ),
          INDEX(Items, 1 + Explodeλ(indices))
       )

    where

    NextCombinationλ
    = LAMBDA(n,p, 
         LAMBDA(initial,s,
            LET(
                place,     n^SEQUENCE(1,p,p-1,-1),
                increment, 1 + SUM(place * Explodeλ(initial)),
                converted, BASE(increment,n,p),
                vector,    Explodeλ(converted),
                modified,  SCAN( , vector, Maxλ),
                CONCAT(modified)
            )
         )
      );

    supported by

    Explodeλ = LAMBDA(term, MID(term, SEQUENCE(1,pick), 1));
    
    Maxλ = LAMBDA(u, v, Max(u,v));

    I have just performed the calculation for 8 picks taken for 9 items in 300ms.  Ther are 12,870 combinations and the permutations would have been over 43M!

     

    • mtarler's avatar
      mtarler
      Silver Contributor

      PeterBartholomew1 Ok now for my questions to Peter...
      you have a line:

              SCAN(initialise, SEQUENCE(), NextCombinationλ(count,pick))
      then in Next Combination you have:
      = LAMBDA(n,p,
              LAMBDA(initial,s,

      so does the (count,pick) of the lambda caller get assigned to the outer Lambda into n,p and then the SCAN variables get passed to the inner Lambda as initial, s ????

      This is a very interesting structure to me that I haven't seen/played with.

      If i understand correctly your 'place' and 'increment' variables are basically reconstructing the DECIMAL value? or is there something more?

      I'm really perplexed my your Max lambda using MAX() as it appears those are (potentially) ascii characters at that point.  did you try it when you exceed 9?

      Thank you both, the exercise and diving into the solutions has been very educational. 

      • PeterBartholomew1's avatar
        PeterBartholomew1
        Silver Contributor

        mtarler 

        Yes, you are correct that I used 'place' and 'increment' in lieu of DECIMAL.  Sadly, I had not come across DECIMAL though I assume it was introduced at the same time as BASE😞 

        The function definition

         

        NextCombinationλ
        = LAMBDA(n,p, LAMBDA(initial,s,
                LET(          
                    increment, 1 + DECIMAL(initial,n),
                    converted, BASE(increment,n,p),
                    vector,    Explodeλ(converted),
                    modified,  SCAN( , vector, Maxλ),
                    CONCAT(modified)
                )
          ));

         

        now reads somewhat better.  Your understanding of the parameter strings for 'NextCombinationλ' is also correct.  It would be possible to take the process further and provide each of the parameters individually, expressing the function in its 'Curried' form.  Such a form takes the parameters one at a time from left to right.  The function is only evaluated when the final parameter value is provided.  In this case, I wanted to provide 'n' and 'p' explicitly within a parameter string but allow the function to accept the remaining parameters from the Lambda helper function.

         

        At the time I was developing the formula, I did not think going beyond 9 items was going to be a practicality, so I did not consider the possibility that 'A' and 'B' were going to appear as 'digits'!  I guess redefining

         

        Maxλ = LAMBDA(u, v, IF(u>v, u, v));

         

        would be a start.  My search for 'readability' above all, has taken me down a path of long variable names, avoiding nested formulas, and placing each statement on a new line.  I even asked for an option to use ':=' as an assignment operator in place of alternate comma separators.

         

        The penalty is that my formulas are somewhat idiosyncratic and sometimes the natural language readability can mask the underlying syntax.

         

    • Patrick2788's avatar
      Patrick2788
      Silver Contributor

      PeterBartholomew1 

      Impressive! I'm still studying how you pulled this off.  When I thought of this experiment several weeks ago, I began by studying the occurrences for each item in each column looking for patterns in the numbers.

      This is the distribution of items in each column.  There's some symmetry but probably not enough to go on that would top anything you've created.  It's interesting that when UNIQUE is used on the above matrix, the occurrences for each number is 2 with the exception of 2 numbers appearing 4 times.

      • PeterBartholomew1's avatar
        PeterBartholomew1
        Silver Contributor

        Patrick2788 

        The key line is

             modified,  SCAN( , vector, Maxλ),

        which scans a row of indices to ensure each sequence increases monotonically.  That allows the algorithm to make massive strides forward to the next possible combination rather than simply incrementing through the permutations and testing the outturn.

    • Patrick2788's avatar
      Patrick2788
      Silver Contributor

      PeterBartholomew1 

      Your solution is very efficient.  Both of our approaches are similar in that goal is to create the 126x4 matrix (6 items, pick 4 in the example) of numbers.

       

      I like how you don't generate much excess at all array-wise, so there's no need to FILTER.  I'll need to brush up on BASE. Tip of the cap there!

       

      Essentially, my approach is to generate an array of numbers from 1111 to 6666 (6 being the total number of items in the example).

      =SEQUENCE(REPT(COUNTA(Items),pick)-REPT(1,pick)+1,,REPT(1,pick))

      Removed any numbers containing 0, 7, 8, and 9 because there's only 6 items and 0 is never selected.

      =UNIQUE(BYROW(Arr,LAMBDA(row,IF(MAX(IFERROR(SEARCH(Exclude,row),0))=0,row,""))),,1)

      Converting scalar to array, sorting, joining, splitting, wrapping, etc.

      =WRAPROWS(TEXTSPLIT(TEXTJOIN(",",,UNIQUE(BYROW(ExArray,LAMBDA(row,TEXTJOIN(",",,SORT(MID(row,SEQUENCE(,pick),1),,,1)))))),,","),pick)*1

       From there it was a matter of a lookup.


      I'm going tinker with both solutions to see if I can get the combinations to exceed 3,003.  There seems to be a limit with SEQUENCE that's preventing expanding the item and pick # beyond about 10 items or so.

       

      By itself, this is as far as I can get SEQUENCE to calculate:

      =SEQUENCE(1000000,53)
      • PeterBartholomew1's avatar
        PeterBartholomew1
        Silver Contributor

        Patrick2788 

        I have had a further go at improving the efficiency of the calculation.  However, the core idea is still to generate integers strings corresponding to the permutations and then filter out the repetitions.  This time, rather than sorting and using UNIQUE, I have examined each number to check whether it is already correctly sorted left to right.   That allows me to use FILTER.  The limit I hit on calculation size is that the number of permutations must not exceed 2²⁰.  I have attempted to use full array formulas rather than Lambda helper functions where possible.  An example of that is in FilterDecrλ where I compare entire columns of the array rather than working row by row.

         

        It must be possible to implement a counter that that replaces any rightmost 0s by the newly incremented digit to their left (creating a jump in the series) but the logic is likely to be more contorted.

         

    • Patrick2788's avatar
      Patrick2788
      Silver Contributor
      This is a clever solution. I'm still unpacking your formula and the use of BASE to generate the numbering. I was able to generate 3,003 combinations. It seems anything beyond that is something SEQUENCE won't calculate.

Resources