Forum Discussion
LAMBDA Halloween Challenge
PeterBartholomew1 SergeiBaklan mtarler
Goal: generate an array of all possible combinations (w/repetitions) from a list of items.
For example, if there are 6 total items with 4 chosen (allowing for repeats), there are 126 possible combinations as evident from:
=COMBINA(6,4)
The challenge is to generate an array of all these possible combinations. My solution is in the attached workbook, but I won't explain it yet, so I don't interfere with anyone's creative process. The most combinations I'm able to generate is 3,003. There seems to be a limit or two preventing me from scaling this setup even bigger.
Finally, I have a solution!
This time I used a Lambda function to add 1 to the previous index (stored in base n) but before returning the value, I scanned the digits and copied the larger digit across.
WorksheetFormula = LET( count, COUNTA(Items), initialise, REPT("0",pick), indices, VSTACK( initialise, SCAN(initialise, SEQUENCE(combinations-1), NextCombinationλ(count,pick)) ), INDEX(Items, 1 + Explodeλ(indices)) )
where
NextCombinationλ = LAMBDA(n,p, LAMBDA(initial,s, LET( place, n^SEQUENCE(1,p,p-1,-1), increment, 1 + SUM(place * Explodeλ(initial)), converted, BASE(increment,n,p), vector, Explodeλ(converted), modified, SCAN( , vector, Maxλ), CONCAT(modified) ) ) );
supported by
Explodeλ = LAMBDA(term, MID(term, SEQUENCE(1,pick), 1)); Maxλ = LAMBDA(u, v, Max(u,v));
I have just performed the calculation for 8 picks taken for 9 items in 300ms. Ther are 12,870 combinations and the permutations would have been over 43M!
22 Replies
- Patrick2788Silver Contributor
This is a novice's attempt at completing this challenge with Python in Excel.
import itertools as it items = np.array(xl("Items")).tolist() k = xl("pick") combinations = list(it.combinations_with_replacement(items,k)) flattened = np.array(combinations).flatten() l = len(flattened)//k np.array(flattened).reshape((l,k))
I'm interested in any improvements that can be made to this approach.
- SergeiBaklanDiamond Contributor
Patrick2788 , I'm not a pythonist and a not person who is able to improve python codes. However, perhaps bit shorter will be
import itertools as it items = xl("Items").to_numpy().tolist() pick = xl("pick") combinations = list(it.combinations_with_replacement(items,pick) ) np.vstack(combinations).reshape(-1,pick)
not sure about performance.
- Patrick2788Silver ContributorThat's an interesting approach. The combinations part (or permutations) is straight forward. It becomes a matter of how to best unpack the list because Excel does not like nested arrays! This gives me something to experiment with. Thank you for the reply!
- davidlealIron Contributor
Patrick2788 You can try the following approach:
=LET(COMBINA_SETS, LAMBDA(x,n,LET( NEXT_SET, LAMBDA(ME,x,m,n,i,IF(i=0,x,LET(s,SEQUENCE(,n),y,x+1, idx, XMATCH(m,IF(i < n,IF(s>i,m+1,y),y),-1,-1), ME(ME,IF(s=i,INDEX(y,idx),x),m,n,IF(idx=i,0,i-1))))), y,TOCOL(x),m,ROWS(y),IF(AND(n=1,m=1),x, LET(cnts, SEQUENCE(COMBINA(m,n)-1),idx,REDUCE(SEQUENCE(,n,1,0),cnts,LAMBDA(ac,i, VSTACK(ac,NEXT_SET(NEXT_SET,IF(i=1,ac,TAKE(ac,-1)),m,n,n)))), INDEX(y,idx))))), COMBINA_SETS(B1:B6,D1))
It uses a user LAMBDA function COMBINA_SETS, which relays on a recursive user LAMBDA function NEXT_SET. To avoid creating a function in the Name Manager, I used the following trick: https://www.sumproduct.com/news/article/lambda-formulaic-recursion-its-all-about-me but you can remove the ME argument and define it in the Name Manager if you want (this a trick to use a recursive function inside of LET). I prefer this way because the sole purpose is to be used in COMBINA_SETS. If you enter a given set of values it returns the next one. The rest is just to use REDUCE to generate all the combinations based on NEXT_SET. It doesn't rely on Excel binary functions, since all of them have limitations on the number of digits of the binary number (DEC2BIN and BASE).
As per the shared link: The current operand stack limit in Excel is 1,024. This should be borne in mind together with calculation times, as the current recursion limit is set as 1,024 divided by (number of lambda parameters + 1)
Better to reduce the input argument list in NEXT_SET, so it can be formulated like this:
=LET(COMBINA_SETS, LAMBDA(x,n,LET(y,TOCOL(x), m,ROWS(y), NEXT_SET, LAMBDA(ME,x,i,IF(i=0,x,LET(s,SEQUENCE(,n),y, x+1, idx,XMATCH(m,IF(i<n,IF(s>i,m+1,y),y),-1,-1), ME(ME,IF(s=i,INDEX(y,idx),x),IF(idx=i,0,i-1))))), IF(AND(n=1,m=1),x,LET(cnts,SEQUENCE(COMBINA(m,n)-1), idx,REDUCE(SEQUENCE(,n,1,0),cnts,LAMBDA(ac,i, VSTACK(ac,NEXT_SET(NEXT_SET,IF(i=1,ac,TAKE(ac,-1)),n)))), INDEX(y,idx))))), COMBINA_SETS(B1:B6,D1))
so the limit would be 256 digits (max pick) in the worst-case scenario. Which is bigger than the BASE function limit of 53 digits.
For example, for the input data: {1,2,3,4,5,6} to find the combinations of three elements
1 1 1 1 1 2 <- NEXT_SET(NEXT_SET,A1:C1,6,3,3) 1 1 3 <- NEXT_SET(NEXT_SET,A2:C2,6,3,3) 1 1 4 <- NEXT_SET(NEXT_SET,A3:C3,6,3,3) 1 1 5 <- NEXT_SET(NEXT_SET,A4:C4,6,3,3) 1 1 6 <- NEXT_SET(NEXT_SET,A5:C5,6,3,3) 1 2 2 <- NEXT_SET(NEXT_SET,A6:C6,6,3,3) ... 6 6 6 <- NEXT_SET(NEXT_SET,A56:C56,6,3,3)
The algorithm goes backward, from right to left doing the corresponding increments, if the maximum value is reached, then it goes to the next digit recursively doing the same. It follows the same intuitive process a human will do to generate all combinations.
Here is the output:
Not showing the entire list on the screen.
This solution doesn't generate a set that then requires removing duplicates, etc. it generates exactly the combinations required. It considers also special cases such as m=n, where we don't need a recursion process. It also works in case the input array is column-wise instead of row-wise.
- PeterBartholomew1Silver Contributor
Matt, thanks to your introduction to the DECIMAL function, I was able to put together a reasonably elegant solution to a LinkedIn problem of calculating a check digit for a shipping container code! Probably the first time I have needed base 36 too!
You might also find the dynamic documentation sheet entertaining.
- Patrick2788Silver Contributor
I 'cleaned up' my original solution to remove SEARCH:
=WRAPROWS(TEXTSPLIT(TEXTJOIN("|",,DROP(SORT(UNIQUE(MAP(Arr,LAMBDA(e,IF(OR(MIN(Burst(e))=0,MAX(Burst(e))>COUNTA(Items)),"",TEXTJOIN("|",,SORT(Burst(e),,,1))))))),1)),,"|"),pick)*1
Burst (I like 'Explode' better but I feel I'd be stealing Peter's name) being:
=LAMBDA(scalar,1*MID(scalar,SEQUENCE(,LEN(scalar)),1))
It caps out at 9 items, pick 5. It doesn't matter what I use on the final matrix to convert numbers to names: MAP/XLOOKUP, MAP/CHOOSECOLS, SCAN/CHOOSECOLS, etc. Won't go beyond 9/5 because it has to look up each element.
As aside to this project. Wouldn't it be nice if I could do this ('Explode' a scalar):
With this:
=TEXTSPLIT(D1,"")
The above produces a #VALUE! error. Being able to explode a scalar with TEXTSPLIT would be really convenient.
- PeterBartholomew1Silver Contributor
By all means use Explode! I think I adopted from a video demonstration by Jack Williams of Microsoft Research, so I have no claim to the name!
I agree that the 'null separator' should be interpreted as 'split at every character boundary'. Not that it would do much good at present due to the fact that
= TEXTSPLIT(list,",")
only returns the first term of each member of the list. The justification is that Excel has never handled nested arrays so, for the sake of backward compatibility, it should do the same now! I have never subscribed to the view that it is a good idea to compromise one's future to maintain backward compatibility to a steaming pile of junk! In any case
= MID(list,{1,3,5,7},1)
always has worked. Although
= TEXTSPLIT( TEXTJOIN(";",,list), ",",";" )
works, one runs into parameter character size problems, as I believe you have demonstrated.
- PeterBartholomew1Silver Contributor
It has just occurred to me that I stopped my 'Lambda-isation' one step short of a 'perfect' solution.
The principle (which might be better discussed on JKPieterse 's Lambda discussion LI site) is
"Every spreadsheet formula should be expressed as a Lambda function"!
The name chosen for the function should provide a succinct statement of what the function does, and the parameters provide an inclusive list of the formula's immediate precedents.
There are assumptions implicit in this statement. For example, it assumes that the user needs a clear idea of what the formula does but has no need to understand how it does it (the developer and auditor's needs are somewhat different and in the world of spreadsheets the user often adopts multiple roles).
WorksheetFormula = CombinationsAλ(Items, pick) CombinationsAλ = LAMBDA(Items, p, LET( count, COUNTA(Items), initialise, REPT("0",p), counter, SEQUENCE(COMBINA(count,pick)-1), indices, VSTACK(initialise, SCAN(initialise, counter, NextCombinationλ(count,p))), INDEX(Items, 1 + Explodeλ(indices)) ) );
Matt. I tried removing 'initialise', VSTACK and the use of initialise within SCAN but, this far, without achieving the perfect result!
Patrick. I have included a formula for your occurrence matrix, but the nested array problem makes it ugly. I should rework it as a Scanλ(…, Countifsλ) that uses REDUCE and HSTACK to emulate the intended SCAN while passing the COUNTIFS as a Lambda variable. I have done it before but none of this is a trivial exercise and it is annoying that such workarounds are needed.
- PeterBartholomew1Silver Contributor
Sorry
= MAKEARRAY(COUNTA(Items),pick, LAMBDA(r,c, COUNTIFS(INDEX(result#,,c), INDEX(Items,r)) ) )
would have been better in this case.
- PeterBartholomew1Silver Contributor
Finally, I have a solution!
This time I used a Lambda function to add 1 to the previous index (stored in base n) but before returning the value, I scanned the digits and copied the larger digit across.
WorksheetFormula = LET( count, COUNTA(Items), initialise, REPT("0",pick), indices, VSTACK( initialise, SCAN(initialise, SEQUENCE(combinations-1), NextCombinationλ(count,pick)) ), INDEX(Items, 1 + Explodeλ(indices)) )
where
NextCombinationλ = LAMBDA(n,p, LAMBDA(initial,s, LET( place, n^SEQUENCE(1,p,p-1,-1), increment, 1 + SUM(place * Explodeλ(initial)), converted, BASE(increment,n,p), vector, Explodeλ(converted), modified, SCAN( , vector, Maxλ), CONCAT(modified) ) ) );
supported by
Explodeλ = LAMBDA(term, MID(term, SEQUENCE(1,pick), 1)); Maxλ = LAMBDA(u, v, Max(u,v));
I have just performed the calculation for 8 picks taken for 9 items in 300ms. Ther are 12,870 combinations and the permutations would have been over 43M!
- mtarlerSilver Contributor
PeterBartholomew1 Ok now for my questions to Peter...
you have a line:SCAN(initialise, SEQUENCE(), NextCombinationλ(count,pick))
then in Next Combination you have:
= LAMBDA(n,p,
LAMBDA(initial,s,so does the (count,pick) of the lambda caller get assigned to the outer Lambda into n,p and then the SCAN variables get passed to the inner Lambda as initial, s ????
This is a very interesting structure to me that I haven't seen/played with.
If i understand correctly your 'place' and 'increment' variables are basically reconstructing the DECIMAL value? or is there something more?
I'm really perplexed my your Max lambda using MAX() as it appears those are (potentially) ascii characters at that point. did you try it when you exceed 9?
Thank you both, the exercise and diving into the solutions has been very educational.
- PeterBartholomew1Silver Contributor
Yes, you are correct that I used 'place' and 'increment' in lieu of DECIMAL. Sadly, I had not come across DECIMAL though I assume it was introduced at the same time as BASE! 😞
The function definition
NextCombinationλ = LAMBDA(n,p, LAMBDA(initial,s, LET( increment, 1 + DECIMAL(initial,n), converted, BASE(increment,n,p), vector, Explodeλ(converted), modified, SCAN( , vector, Maxλ), CONCAT(modified) ) ));
now reads somewhat better. Your understanding of the parameter strings for 'NextCombinationλ' is also correct. It would be possible to take the process further and provide each of the parameters individually, expressing the function in its 'Curried' form. Such a form takes the parameters one at a time from left to right. The function is only evaluated when the final parameter value is provided. In this case, I wanted to provide 'n' and 'p' explicitly within a parameter string but allow the function to accept the remaining parameters from the Lambda helper function.
At the time I was developing the formula, I did not think going beyond 9 items was going to be a practicality, so I did not consider the possibility that 'A' and 'B' were going to appear as 'digits'! I guess redefining
Maxλ = LAMBDA(u, v, IF(u>v, u, v));
would be a start. My search for 'readability' above all, has taken me down a path of long variable names, avoiding nested formulas, and placing each statement on a new line. I even asked for an option to use ':=' as an assignment operator in place of alternate comma separators.
The penalty is that my formulas are somewhat idiosyncratic and sometimes the natural language readability can mask the underlying syntax.
- Patrick2788Silver Contributor
Impressive! I'm still studying how you pulled this off. When I thought of this experiment several weeks ago, I began by studying the occurrences for each item in each column looking for patterns in the numbers.
This is the distribution of items in each column. There's some symmetry but probably not enough to go on that would top anything you've created. It's interesting that when UNIQUE is used on the above matrix, the occurrences for each number is 2 with the exception of 2 numbers appearing 4 times.
- PeterBartholomew1Silver Contributor
The key line is
modified, SCAN( , vector, Maxλ),
which scans a row of indices to ensure each sequence increases monotonically. That allows the algorithm to make massive strides forward to the next possible combination rather than simply incrementing through the permutations and testing the outturn.
- PeterBartholomew1Silver Contributor
This is somewhat lacking in mathematical refinement. I generated the permutations and then sorted and removed duplicates. I don't know what the size limitations might be.
- Patrick2788Silver Contributor
Your solution is very efficient. Both of our approaches are similar in that goal is to create the 126x4 matrix (6 items, pick 4 in the example) of numbers.
I like how you don't generate much excess at all array-wise, so there's no need to FILTER. I'll need to brush up on BASE. Tip of the cap there!
Essentially, my approach is to generate an array of numbers from 1111 to 6666 (6 being the total number of items in the example).
=SEQUENCE(REPT(COUNTA(Items),pick)-REPT(1,pick)+1,,REPT(1,pick))
Removed any numbers containing 0, 7, 8, and 9 because there's only 6 items and 0 is never selected.
=UNIQUE(BYROW(Arr,LAMBDA(row,IF(MAX(IFERROR(SEARCH(Exclude,row),0))=0,row,""))),,1)
Converting scalar to array, sorting, joining, splitting, wrapping, etc.
=WRAPROWS(TEXTSPLIT(TEXTJOIN(",",,UNIQUE(BYROW(ExArray,LAMBDA(row,TEXTJOIN(",",,SORT(MID(row,SEQUENCE(,pick),1),,,1)))))),,","),pick)*1
From there it was a matter of a lookup.
I'm going tinker with both solutions to see if I can get the combinations to exceed 3,003. There seems to be a limit with SEQUENCE that's preventing expanding the item and pick # beyond about 10 items or so.By itself, this is as far as I can get SEQUENCE to calculate:
=SEQUENCE(1000000,53)
- PeterBartholomew1Silver Contributor
I have had a further go at improving the efficiency of the calculation. However, the core idea is still to generate integers strings corresponding to the permutations and then filter out the repetitions. This time, rather than sorting and using UNIQUE, I have examined each number to check whether it is already correctly sorted left to right. That allows me to use FILTER. The limit I hit on calculation size is that the number of permutations must not exceed 2²⁰. I have attempted to use full array formulas rather than Lambda helper functions where possible. An example of that is in FilterDecrλ where I compare entire columns of the array rather than working row by row.
It must be possible to implement a counter that that replaces any rightmost 0s by the newly incremented digit to their left (creating a jump in the series) but the logic is likely to be more contorted.
- Patrick2788Silver ContributorThis is a clever solution. I'm still unpacking your formula and the use of BASE to generate the numbering. I was able to generate 3,003 combinations. It seems anything beyond that is something SEQUENCE won't calculate.