Oct 28 2022 12:25 PM - edited Oct 29 2022 12:08 PM
@Peter Bartholomew @Sergei Baklan @mtarler
Goal: generate an array of all possible combinations (w/repetitions) from a list of items.
For example, if there are 6 total items with 4 chosen (allowing for repeats), there are 126 possible combinations as evident from:
=COMBINA(6,4)
The challenge is to generate an array of all these possible combinations. My solution is in the attached workbook, but I won't explain it yet, so I don't interfere with anyone's creative process. The most combinations I'm able to generate is 3,003. There seems to be a limit or two preventing me from scaling this setup even bigger.
Oct 29 2022 04:04 PM
This is somewhat lacking in mathematical refinement. I generated the permutations and then sorted and removed duplicates. I don't know what the size limitations might be.
Oct 30 2022 04:12 AM
Oct 30 2022 07:20 AM
Your solution is very efficient. Both of our approaches are similar in that goal is to create the 126x4 matrix (6 items, pick 4 in the example) of numbers.
I like how you don't generate much excess at all array-wise, so there's no need to FILTER. I'll need to brush up on BASE. Tip of the cap there!
Essentially, my approach is to generate an array of numbers from 1111 to 6666 (6 being the total number of items in the example).
=SEQUENCE(REPT(COUNTA(Items),pick)-REPT(1,pick)+1,,REPT(1,pick))
Removed any numbers containing 0, 7, 8, and 9 because there's only 6 items and 0 is never selected.
=UNIQUE(BYROW(Arr,LAMBDA(row,IF(MAX(IFERROR(SEARCH(Exclude,row),0))=0,row,""))),,1)
Converting scalar to array, sorting, joining, splitting, wrapping, etc.
=WRAPROWS(TEXTSPLIT(TEXTJOIN(",",,UNIQUE(BYROW(ExArray,LAMBDA(row,TEXTJOIN(",",,SORT(MID(row,SEQUENCE(,pick),1),,,1)))))),,","),pick)*1
From there it was a matter of a lookup.
I'm going tinker with both solutions to see if I can get the combinations to exceed 3,003. There seems to be a limit with SEQUENCE that's preventing expanding the item and pick # beyond about 10 items or so.
By itself, this is as far as I can get SEQUENCE to calculate:
=SEQUENCE(1000000,53)
Oct 30 2022 11:18 AM
I have had a further go at improving the efficiency of the calculation. However, the core idea is still to generate integers strings corresponding to the permutations and then filter out the repetitions. This time, rather than sorting and using UNIQUE, I have examined each number to check whether it is already correctly sorted left to right. That allows me to use FILTER. The limit I hit on calculation size is that the number of permutations must not exceed 2²⁰. I have attempted to use full array formulas rather than Lambda helper functions where possible. An example of that is in FilterDecrλ where I compare entire columns of the array rather than working row by row.
It must be possible to implement a counter that that replaces any rightmost 0s by the newly incremented digit to their left (creating a jump in the series) but the logic is likely to be more contorted.
Oct 30 2022 02:28 PM
Re: permutation limit
Yes, it seems 9 items, pick 6 (about 500,000 permutations) is a struggle to calculate. My solution calculates half the time while yours holds up a bit more than that. There were a few instances where Excel wouldn't update the elements in the array.
It must be possible to implement a counter that that replaces any rightmost 0s by the newly incremented digit to their left (creating a jump in the series) but the logic is likely to be more contorted.
This seems to be the case. I've been messing around with some seldom used engineering functions like BITLSHIFT/BITRSHIFT but can only get so far with them. Even if I could get something to work there's the issue of using older functions that weren't intended to be used with arrays, so I believe there's some "lifting" being done - not optimal.
Oct 30 2022 03:48 PM
SolutionFinally, I have a solution!
This time I used a Lambda function to add 1 to the previous index (stored in base n) but before returning the value, I scanned the digits and copied the larger digit across.
WorksheetFormula
= LET(
count, COUNTA(Items),
initialise, REPT("0",pick),
indices, VSTACK(
initialise,
SCAN(initialise, SEQUENCE(combinations-1), NextCombinationλ(count,pick))
),
INDEX(Items, 1 + Explodeλ(indices))
)
where
NextCombinationλ
= LAMBDA(n,p,
LAMBDA(initial,s,
LET(
place, n^SEQUENCE(1,p,p-1,-1),
increment, 1 + SUM(place * Explodeλ(initial)),
converted, BASE(increment,n,p),
vector, Explodeλ(converted),
modified, SCAN( , vector, Maxλ),
CONCAT(modified)
)
)
);
supported by
Explodeλ = LAMBDA(term, MID(term, SEQUENCE(1,pick), 1));
Maxλ = LAMBDA(u, v, Max(u,v));
I have just performed the calculation for 8 picks taken for 9 items in 300ms. Ther are 12,870 combinations and the permutations would have been over 43M!
Oct 31 2022 06:29 AM
Impressive! I'm still studying how you pulled this off. When I thought of this experiment several weeks ago, I began by studying the occurrences for each item in each column looking for patterns in the numbers.
This is the distribution of items in each column. There's some symmetry but probably not enough to go on that would top anything you've created. It's interesting that when UNIQUE is used on the above matrix, the occurrences for each number is 2 with the exception of 2 numbers appearing 4 times.
Oct 31 2022 08:39 AM
The key line is
modified, SCAN( , vector, Maxλ),
which scans a row of indices to ensure each sequence increases monotonically. That allows the algorithm to make massive strides forward to the next possible combination rather than simply incrementing through the permutations and testing the outturn.
Oct 31 2022 01:24 PM
@Peter Bartholomew @Patrick2788 Ok so I will submit mine but I think it is very similar to Peter's (but I have some questions for him...):
=LET(b, COUNTA(Items), combN, COMBINA(b,pick), n, SEQUENCE(combN,,0),
series, SCAN(,n,LAMBDA(p,i, CONCAT(SCAN(0, MID(BASE(DECIMAL(p,b)+1,b,pick),SEQUENCE(pick),1),LAMBDA(pp,ii,IF(pp>ii,pp,ii)))))),
INDEX(Items,DECIMAL(MID(series,SEQUENCE(1,pick),1),b)+1))
Peter's is much more readable than mine but I still get lost. In mine I create a list of possible options using base() which is the same from what I can tell. I build my list by a manual sequential adding 1 and then make sure it isn't less then the index before it. In looking back it appears you guy originally did a filtering but I think this is how Peter's new version works also. My series is built on line 2 and proud to say this is the first time I found an advantage to doing a SCAN() without the initialize value to force it to use the first value in the sequence (zero in this case).
Since it is hard to read in the middle is BASE(DECIMAL()+1) which converts the text to a number, then +1 and back to the text. then outside that is a MID() to break it into characters and those characters are sent through the SCAN with the IF(>,,). I use the IF(>,,) instead of MAX because I don't bother converting back to decimal until the end so the IF can handle the ascii.
maybe peter can run the speed test on this and see how it compares.
Oct 31 2022 01:39 PM
@Peter Bartholomew Ok now for my questions to Peter...
you have a line:
SCAN(initialise, SEQUENCE(), NextCombinationλ(count,pick))
then in Next Combination you have:
= LAMBDA(n,p,
LAMBDA(initial,s,
so does the (count,pick) of the lambda caller get assigned to the outer Lambda into n,p and then the SCAN variables get passed to the inner Lambda as initial, s ????
This is a very interesting structure to me that I haven't seen/played with.
If i understand correctly your 'place' and 'increment' variables are basically reconstructing the DECIMAL value? or is there something more?
I'm really perplexed my your Max lambda using MAX() as it appears those are (potentially) ascii characters at that point. did you try it when you exceed 9?
Thank you both, the exercise and diving into the solutions has been very educational.
Oct 31 2022 01:55 PM - edited Oct 31 2022 01:58 PM
I was able to extend yours to 14 items, pick 9, 20,661,046,784 permutations. Excel wanted no part of 15 items!
I'd be interested in knowing the timing of the solutions vs. having a 'cheat sheet' of all the item positions (as constants) and simply looking up each element to obtain the item name.
Oct 31 2022 03:39 PM - edited Oct 31 2022 03:41 PM
Yes, you are correct that I used 'place' and 'increment' in lieu of DECIMAL. Sadly, I had not come across DECIMAL though I assume it was introduced at the same time as BASE! :(
The function definition
NextCombinationλ
= LAMBDA(n,p, LAMBDA(initial,s,
LET(
increment, 1 + DECIMAL(initial,n),
converted, BASE(increment,n,p),
vector, Explodeλ(converted),
modified, SCAN( , vector, Maxλ),
CONCAT(modified)
)
));
now reads somewhat better. Your understanding of the parameter strings for 'NextCombinationλ' is also correct. It would be possible to take the process further and provide each of the parameters individually, expressing the function in its 'Curried' form. Such a form takes the parameters one at a time from left to right. The function is only evaluated when the final parameter value is provided. In this case, I wanted to provide 'n' and 'p' explicitly within a parameter string but allow the function to accept the remaining parameters from the Lambda helper function.
At the time I was developing the formula, I did not think going beyond 9 items was going to be a practicality, so I did not consider the possibility that 'A' and 'B' were going to appear as 'digits'! I guess redefining
Maxλ = LAMBDA(u, v, IF(u>v, u, v));
would be a start. My search for 'readability' above all, has taken me down a path of long variable names, avoiding nested formulas, and placing each statement on a new line. I even asked for an option to use ':=' as an assignment operator in place of alternate comma separators.
The penalty is that my formulas are somewhat idiosyncratic and sometimes the natural language readability can mask the underlying syntax.
Oct 31 2022 03:58 PM
Nov 01 2022 03:35 AM - edited Nov 01 2022 03:36 AM
It has just occurred to me that I stopped my 'Lambda-isation' one step short of a 'perfect' solution.
The principle (which might be better discussed on @Jan Karel Pieterse 's Lambda discussion LI site) is
"Every spreadsheet formula should be expressed as a Lambda function"!
The name chosen for the function should provide a succinct statement of what the function does, and the parameters provide an inclusive list of the formula's immediate precedents.
There are assumptions implicit in this statement. For example, it assumes that the user needs a clear idea of what the formula does but has no need to understand how it does it (the developer and auditor's needs are somewhat different and in the world of spreadsheets the user often adopts multiple roles).
WorksheetFormula
= CombinationsAλ(Items, pick)
CombinationsAλ
= LAMBDA(Items, p,
LET(
count, COUNTA(Items),
initialise, REPT("0",p),
counter, SEQUENCE(COMBINA(count,pick)-1),
indices, VSTACK(initialise,
SCAN(initialise, counter, NextCombinationλ(count,p))),
INDEX(Items, 1 + Explodeλ(indices))
)
);
Matt. I tried removing 'initialise', VSTACK and the use of initialise within SCAN but, this far, without achieving the perfect result!
Patrick. I have included a formula for your occurrence matrix, but the nested array problem makes it ugly. I should rework it as a Scanλ(…, Countifsλ) that uses REDUCE and HSTACK to emulate the intended SCAN while passing the COUNTIFS as a Lambda variable. I have done it before but none of this is a trivial exercise and it is annoying that such workarounds are needed.
Nov 01 2022 04:17 AM
Sorry
= MAKEARRAY(COUNTA(Items),pick,
LAMBDA(r,c,
COUNTIFS(INDEX(result#,,c), INDEX(Items,r))
)
)
would have been better in this case.
Nov 01 2022 06:55 AM
I 'cleaned up' my original solution to remove SEARCH:
=WRAPROWS(TEXTSPLIT(TEXTJOIN("|",,DROP(SORT(UNIQUE(MAP(Arr,LAMBDA(e,IF(OR(MIN(Burst(e))=0,MAX(Burst(e))>COUNTA(Items)),"",TEXTJOIN("|",,SORT(Burst(e),,,1))))))),1)),,"|"),pick)*1
Burst (I like 'Explode' better but I feel I'd be stealing Peter's name) being:
=LAMBDA(scalar,1*MID(scalar,SEQUENCE(,LEN(scalar)),1))
It caps out at 9 items, pick 5. It doesn't matter what I use on the final matrix to convert numbers to names: MAP/XLOOKUP, MAP/CHOOSECOLS, SCAN/CHOOSECOLS, etc. Won't go beyond 9/5 because it has to look up each element.
As aside to this project. Wouldn't it be nice if I could do this ('Explode' a scalar):
With this:
=TEXTSPLIT(D1,"")
The above produces a #VALUE! error. Being able to explode a scalar with TEXTSPLIT would be really convenient.
Nov 01 2022 09:41 AM
By all means use Explode! I think I adopted from a video demonstration by Jack Williams of Microsoft Research, so I have no claim to the name!
I agree that the 'null separator' should be interpreted as 'split at every character boundary'. Not that it would do much good at present due to the fact that
= TEXTSPLIT(list,",")
only returns the first term of each member of the list. The justification is that Excel has never handled nested arrays so, for the sake of backward compatibility, it should do the same now! I have never subscribed to the view that it is a good idea to compromise one's future to maintain backward compatibility to a steaming pile of junk! In any case
= MID(list,{1,3,5,7},1)
always has worked. Although
= TEXTSPLIT(
TEXTJOIN(";",,list),
",",";"
)
works, one runs into parameter character size problems, as I believe you have demonstrated.
Nov 05 2022 07:14 AM
Matt, thanks to your introduction to the DECIMAL function, I was able to put together a reasonably elegant solution to a LinkedIn problem of calculating a check digit for a shipping container code! Probably the first time I have needed base 36 too!
You might also find the dynamic documentation sheet entertaining.
Mar 31 2023 09:10 PM - edited Apr 01 2023 09:59 AM
@Patrick2788 You can try the following approach:
=LET(COMBINA_SETS, LAMBDA(x,n,LET(
NEXT_SET, LAMBDA(ME,x,m,n,i,IF(i=0,x,LET(s,SEQUENCE(,n),y,x+1,
idx, XMATCH(m,IF(i < n,IF(s>i,m+1,y),y),-1,-1),
ME(ME,IF(s=i,INDEX(y,idx),x),m,n,IF(idx=i,0,i-1))))),
y,TOCOL(x),m,ROWS(y),IF(AND(n=1,m=1),x, LET(cnts,
SEQUENCE(COMBINA(m,n)-1),idx,REDUCE(SEQUENCE(,n,1,0),cnts,LAMBDA(ac,i,
VSTACK(ac,NEXT_SET(NEXT_SET,IF(i=1,ac,TAKE(ac,-1)),m,n,n)))),
INDEX(y,idx))))),
COMBINA_SETS(B1:B6,D1))
It uses a user LAMBDA function COMBINA_SETS, which relays on a recursive user LAMBDA function NEXT_SET. To avoid creating a function in the Name Manager, I used the following trick: LAMBDA Formulaic Recursion: It’s All About ME! but you can remove the ME argument and define it in the Name Manager if you want (this a trick to use a recursive function inside of LET). I prefer this way because the sole purpose is to be used in COMBINA_SETS. If you enter a given set of values it returns the next one. The rest is just to use REDUCE to generate all the combinations based on NEXT_SET. It doesn't rely on Excel binary functions, since all of them have limitations on the number of digits of the binary number (DEC2BIN and BASE).
As per the shared link: The current operand stack limit in Excel is 1,024. This should be borne in mind together with calculation times, as the current recursion limit is set as 1,024 divided by (number of lambda parameters + 1)
Better to reduce the input argument list in NEXT_SET, so it can be formulated like this:
=LET(COMBINA_SETS, LAMBDA(x,n,LET(y,TOCOL(x), m,ROWS(y),
NEXT_SET, LAMBDA(ME,x,i,IF(i=0,x,LET(s,SEQUENCE(,n),y, x+1,
idx,XMATCH(m,IF(i<n,IF(s>i,m+1,y),y),-1,-1),
ME(ME,IF(s=i,INDEX(y,idx),x),IF(idx=i,0,i-1))))),
IF(AND(n=1,m=1),x,LET(cnts,SEQUENCE(COMBINA(m,n)-1),
idx,REDUCE(SEQUENCE(,n,1,0),cnts,LAMBDA(ac,i,
VSTACK(ac,NEXT_SET(NEXT_SET,IF(i=1,ac,TAKE(ac,-1)),n)))),
INDEX(y,idx))))),
COMBINA_SETS(B1:B6,D1))
so the limit would be 256 digits (max pick) in the worst-case scenario. Which is bigger than the BASE function limit of 53 digits.
For example, for the input data: {1,2,3,4,5,6} to find the combinations of three elements
1 1 1
1 1 2 <- NEXT_SET(NEXT_SET,A1:C1,6,3,3)
1 1 3 <- NEXT_SET(NEXT_SET,A2:C2,6,3,3)
1 1 4 <- NEXT_SET(NEXT_SET,A3:C3,6,3,3)
1 1 5 <- NEXT_SET(NEXT_SET,A4:C4,6,3,3)
1 1 6 <- NEXT_SET(NEXT_SET,A5:C5,6,3,3)
1 2 2 <- NEXT_SET(NEXT_SET,A6:C6,6,3,3)
...
6 6 6 <- NEXT_SET(NEXT_SET,A56:C56,6,3,3)
The algorithm goes backward, from right to left doing the corresponding increments, if the maximum value is reached, then it goes to the next digit recursively doing the same. It follows the same intuitive process a human will do to generate all combinations.
Here is the output:
Not showing the entire list on the screen.
This solution doesn't generate a set that then requires removing duplicates, etc. it generates exactly the combinations required. It considers also special cases such as m=n, where we don't need a recursion process. It also works in case the input array is column-wise instead of row-wise.
Oct 30 2022 03:48 PM
SolutionFinally, I have a solution!
This time I used a Lambda function to add 1 to the previous index (stored in base n) but before returning the value, I scanned the digits and copied the larger digit across.
WorksheetFormula
= LET(
count, COUNTA(Items),
initialise, REPT("0",pick),
indices, VSTACK(
initialise,
SCAN(initialise, SEQUENCE(combinations-1), NextCombinationλ(count,pick))
),
INDEX(Items, 1 + Explodeλ(indices))
)
where
NextCombinationλ
= LAMBDA(n,p,
LAMBDA(initial,s,
LET(
place, n^SEQUENCE(1,p,p-1,-1),
increment, 1 + SUM(place * Explodeλ(initial)),
converted, BASE(increment,n,p),
vector, Explodeλ(converted),
modified, SCAN( , vector, Maxλ),
CONCAT(modified)
)
)
);
supported by
Explodeλ = LAMBDA(term, MID(term, SEQUENCE(1,pick), 1));
Maxλ = LAMBDA(u, v, Max(u,v));
I have just performed the calculation for 8 picks taken for 9 items in 300ms. Ther are 12,870 combinations and the permutations would have been over 43M!