SOLVED

Efficient approach to generate list of combinations with no repetition

Iron Contributor

I am trying to find the best way to generate the combination with no repetition, i.e. the corresponding set related to the total of a combination of COMBIN(n,m) output. I found several approaches with different performances, but maybe there are other ways to do it, only using Excel, and not considering VBA solutions.

 

COMBIN.SETA: The solution with the best performance so far

 

COMBIN.SETA=LAMBDA(x,m, LET(y, TOCOL(x), n, ROWS(y), 
 S, LAMBDA(x,y,SEQUENCE(,x, IF(y="",1,y))), 
 NEXTR, LAMBDA(ME,x,i, LET(j,m-i, y,INDEX(x,j)+1, 
 IF(y<=(n-i), IF(j=1, S(m,y), HSTACK(INDEX(x, S(j-1,)), S(m-j+1,y))),ME(ME,x,i+1)))),
 IF(m=n,x, LET(idx,REDUCE(S(m,), S(COMBIN(n,m)-1,), LAMBDA(ac,a, 
  LET(prev, IF(a=1,ac,TAKE(ac,-1)), VSTACK(ac, NEXTR(NEXTR,prev,0))))), 
  INDEX(y,idx)))))

 

 

 it uses the recursive function NEXTR to generate the next row of the combination, then DROP/REDUCE/VSTACK to generate the final result. Since it uses several SEQUENCE calls I created an auxiliary user LAMBDA function S(x,y) to avoid having a larger formula, but it is just a SEQUENCE call for a specific scenario.

 

The following solutions are based on the following suggestion: How to use COMBIN function. It generates a representation of all binary numbers from 0-2^n-1, then use it as a filter condition in the FILTER function to get the corresponding index values. It doesn't use any binary Excel function that has limitations in the number of digits to be generated. If you concatenate all the values on a given row will be the corresponding binary number, but they are not generated using any binary functions.

 

COMBIN.SETB:

 

COMBIN.SETB = LAMBDA(x, m,
    LET(
        y, TOROW(x),
        n, COLUMNS(y),
        z, MOD(INT((SEQUENCE(2 ^ n) - 1) / 2 ^ SEQUENCE(, n, 0)), 2),
        f, FILTER(z, MMULT(z, SEQUENCE(n) ^ 0) = m),
        DROP(REDUCE("", SEQUENCE(ROWS(f)), LAMBDA(ac, i, VSTACK(ac, FILTER(y, INDEX(f, i, ))))), 1)
    )
)

 

 

It extracts the output via FILTER the columns with value 1, to get the values from y name. I tested that the biggest time consumption is coming from the call DROP/REDUCE/VSTACK, to get z or f output is the less time-consuming portion, but I was not able to find another way to generate the output without using DROP/REDUCE/VSTACK, maybe there is a way but I could not find it. I don't think MAKEARRY will provide a better performance, since it operates at row, col, and level and FILTER gives the entire row.

 

COMBIN.SETC: This variation has the worst performance. It uses exactly the solution from the link shared link but removes the empty cells. It tries to organize the output in a way, that only cells with no empty value will be returned, but this transformation is time-consuming.

 

COMBIN.SETC = LAMBDA(x, m,
    LET(
        y, TOROW(x),
        n, COLUMNS(y),
        z, MOD(INT((SEQUENCE(2 ^ n) - 1) / 2 ^ SEQUENCE(, n, 0)), 2),
        f, IF(FILTER(z, MMULT(z, SEQUENCE(n) ^ 0) = m), y, ""),
        DROP(
            REDUCE(
                "",
                SEQUENCE(ROWS(f)),
                LAMBDA(ac, a, LET(r, INDEX(f, a, ), VSTACK(ac, FILTER(r, r <> ""))))
            ),
            1
        )
    )
)

 

 

COMBIN.SETD: Second-best performance along with COMBINA.SETB. We use here TOROW function to get by row nonempty values in a row, so it replaces FILTER from previous approaches using TOROW

 

 

COMBIN.SETD = LAMBDA(x, m,
    LET(
        y, TOROW(x),
        n, COLUMNS(y),
        z, MOD(INT((SEQUENCE(2 ^ n) - 1) / 2 ^ SEQUENCE(, n, 0)), 2),
        f, IF(FILTER(z, MMULT(z, SEQUENCE(n) ^ 0) = m), y, NA()),
        DROP(REDUCE("", SEQUENCE(ROWS(f)), LAMBDA(ac, a, VSTACK(ac, TOROW(INDEX(f, a, ), 2)))), 1)
    )
)

 

 

Finally, I use the following convenience LAMBDA functions to calculate elapsed time. I don't want to use VBA to measure performance, so I found this way of doing it.

 

PERFORMANCE.SETA = LAMBDA(x, y,
    LET(start, NOW(), out, COMBIN.SET(x, y), 
    IFERROR(HSTACK((NOW() - start) * 24 * 3600, out), "")))
PERFORMANCE.SETB = LAMBDA(x, y,
    LET(start, NOW(), out, COMBIN.SETB(x, y), 
    IFERROR(HSTACK((NOW() - start) * 24 * 3600, out), "")))
PERFORMANCE.SETC = LAMBDA(x, y,
    LET(start, NOW(), out, COMBIN.SETC(x, y), 
    IFERROR(HSTACK((NOW() - start) * 24 * 3600, out), "")))
PERFORMANCE.SETD = LAMBDA(x, y,
    LET(start, NOW(), out, COMBIN.SETD(x, y), 
    IFERROR(HSTACK((NOW() - start) * 24 * 3600, out), "")))

 

 

Then the following screenshot shows the testing scenario, and for each of them, I collect the seconds to do the comparison for each case and to calculate the relative increment concerning the lowest performance CASE A via the following formula: =(B6-$B$5)/$B$5 and drag it down. I set the calculation to Manual for testing each scenario changing the performance function to call with the same input parameters. For example:

 

=PERFORMANCE.SETA(SEQUENCE(,B1,10,10),B2)

 

 

Here is the output

davidleal_0-1680578193876.png

 

I was surprised that the best performance among all of them is SETA, which uses a recursive function, but the recursion is at a column level, which is usually lower than the rows will be generated, and the recursion only happens when the increment on each digit is bigger than the maximum value for each column, so we need to back from right to left to repeat the same process to the next digit.

 

I know generating such a set of combinations is time-consuming, so I am trying to have a solution with the best performance among all. 

 

Note: I use as an input data set: 

 

SEQUENCE(,B1,10,10)

 

 

but please don't assume the data set can be numbers only, it could be also strings, I am looking for a generic and efficient COMBINE.SET function, which can work for numbers or strings.

 

UPDATE

I found the solution COMBIN.SETD, can be improved by removing the DROP/REDUCE/VSTACK, which produces the fastest solution so far, around 0.10 seconds, combining TOCOL with WRAPROWS, as follows:

COMBIN.SETE:

 

COMBIN.SETE = LAMBDA(x, m,
    LET(
        y, TOROW(x),
        n, COLUMNS(y),
        z, MOD(INT((SEQUENCE(2 ^ n) - 1) / 2 ^ SEQUENCE(, n, 0)), 2),
        f, IF(FILTER(z, MMULT(z, SEQUENCE(n) ^ 0) = m), y, NA()),
        WRAPROWS(TOCOL(f,2),m)
    )
)

 

 

Thanks,

 

David

12 Replies

@davidleal 

 

Replacing your potentially costly DROP/REDUCE/VSTACK set-up in COMBIN.SETB with a simple wrap/unwrap combination might be worth a go:

 

=LAMBDA(x, m,
    LET(
        y, TOROW(x),
        n, COLUMNS(y),
        z, MOD(INT((SEQUENCE(2 ^ n) - 1) / 2 ^ SEQUENCE(, n, 0)), 2),
        f, FILTER(z, MMULT(z, SEQUENCE(n) ^ 0) = m),
        WRAPROWS(TOCOL(IF(IF(f, SEQUENCE(, n)), y, NA()), 2), m)
    )
)

 

Regards

@JosWoolley thanks, I updated my question with a similar solution I found using WRAPROWS as you provided while you were posting your solution. It is really fast, I haven't tested yours, but seems to be similar, you have an extra IF, but that maybe not significant. Thanks again!

@JosWoolley I tested with n=19, m=10, 92K rows to be generated, my version SETE, 2.9secs, your solution 3.20secs. I run several times and your solution was always bigger than 3secs, and mine lower than 3secs. Just a minor difference. WRAPROWS makes really the difference!!!

best response confirmed by davidleal (Iron Contributor)
Solution

@davidleal 

Forgive me for asking, but have I got the wrong end of the stick here?

image.png

= IF(Combinationsλ(n,m), "●","-")

Combinationsλ
= LET(
    k, 2 ^ SEQUENCE(1, n, 0),
    h, SEQUENCE(2 ^ n),
    p, SIGN(BITAND(h, k)),
    f, BYROW(p, LAMBDA(p, SUM(p))) = m,
    FILTER(p, f)
  )

@Peter Bartholomew thanks great solution too! I adapted it to be able to return not just 0,1 values, but the actual values of the x-array. I combined it with COMBIN.SETE I provided in the UPDATE section of my question as follows:

 

COMBIN.SETPB = LAMBDA(x, m, LET(
  y, TOROW(x),
  n, COLUMNS(y),
  k, 2 ^ SEQUENCE(1, n, 0),
  h, SEQUENCE(2 ^ n),
  p, SIGN(BITAND(h, k)),
  f, BYROW(p, LAMBDA(p, SUM(p))) = m,
  z, IF(FILTER(p, f), y, NA()),
  WRAPROWS(TOCOL(z, 2), m)))

 

 

since it has been proven the most efficient way so far to convert 0,1 values to x-array.

 

Note: I guess you are using Advanced Formula Environment Add-ins, defining the formula via the Names tab, I converted it to a LAMBDA function, so it can be defined in the Module tab.

 

I tested for n=19, m=10, for the input array set:

 

SEQUENCE(,B1,10,10)

 

 

I run it several times and in all cases > 4secs. I defined the following performance function:

 

PERFORMANCE.SETPB = LAMBDA(x, y, LET(start, NOW(), out, COMBIN.SETPB(x, y), 
  IFERROR(HSTACK((NOW() - start) * 24 * 3600, out), ""))
)

 

and calling it as follows:

 

=PERFORMANCE.SETPB(SEQUENCE(,B1,10,10),B2)

 

maybe you have in mind a more efficient way to do the mapping from 0,1 values to x-array values, so we can improve your approach.

 

Note: Since it is based on a binary function BITAND which has a limit, the maximum number to represent is (2^48)-1, therefore m <= 48.

 

Thanks

@Peter Bartholomewto be fair, I tested it under the same conditions on an Excel desktop, and your solution takes less than 2secs. COMBIN.SETE around 2.2 secs. All previous tests I ran using the free version of Excel Web, I don't know why I am getting different results on each Excel version, maybe they have a different implementation of the underlying functions used, or I would need to do more stress tests. The thing is that for n=20, m=10 around 184K combinations, both solutions don't return the result after waiting several minutes.

@Peter Bartholomew another comment BYROW is less efficient than MMULT per my performance testing. I modified your approach as follows:

 

COMBIN.SETPB_A = LAMBDA(x, m,
    LET(
        y, TOROW(x),
        n, COLUMNS(y),
        k, 2 ^ SEQUENCE(1, n, 0),
        h, SEQUENCE(2 ^ n),
        p, SIGN(BITAND(h, k)),
        f, MMULT(p, SEQUENCE(n, , 1, 0)) = m,
        z, IF(FILTER(p, f), y, NA()),
        WRAPROWS(TOCOL(z, 2), m)
    )
)

 

When I tested it under Excel Web, now it provides the best performance overall, considering the limitation mentioned before about the limit of m <= 48 due to the binary function limitation, but the truth is that any solution will crash before reaching that number at least with the current computing excel capacity.

 

I used SEQUENCE(n,,1,0) over SEQUENCE(n)^0 because I tested them separately and the first one seem to be a little bit faster.

@davidleal 

I performed a speed test for the 184756 combinations returned by ²⁰C₁₂, having first removed the text conversion to leave {0,1}s.  On a Surface, core i7 laptop I got 5667 ms and a wall time of 5-6 seconds, as one might expect?

Thanks, Peter, which test did you perform? which formula?, per my understanding, the version I posted based on your solution using MMULT is the fastest one, so I am going to accept your initial solution.

@davidleal 

MMULT is suprisingly very efficient.  I've seen the limit quoted as an output of 5,460 elements for Excel 2007 and earlier. The 365 limit appears to be much greater (or limited by available memory).

  • The MMULT function returns #VALUE! if the output exceeds 5460 cells.

Description of the limitations for working with arrays in Excel - Office | Microsoft Learn

 

The most I've been able to output is 9 million elements:

 

=LET(n,3000,r,SEQUENCE(n),c,SEQUENCE(,n),MMULT(r,c))

 

@davidleal 

The formula I timed was the BYROW version, without any substitution by text or symbols.  The MMULT version is slightly faster at 5000 ms. for the  ²⁰C₁₀ case.  The combinations can't go beyond n=20 because the intermediate calculation exceeds the available row count.  Broadcasting text puts the timings up a shade once more.

=LET(
    h, 2 ^ SEQUENCE(1, n, 0),
    k, SEQUENCE(2 ^ n),
    p, SIGN(BITAND(h, k)),
    e, SEQUENCE(n, 1, 1, 0),
    f, MMULT(p, e) = m,
    FILTER(p, f)
)

@Peter Bartholomew Thanks, that is why I initially liked the recursive approach SETA, but due to REDUCE/VSTACK it is not efficient, because it generates exactly the total number of combinations, there is no need to generate more rows and then to filter, which allows going to n>=20, but because of its performance, it is not convenient. I hope Microsoft in the future improves REDUCE/VSTACK performance, it is a very useful pattern, but it is time-consuming. I added a suggestion to Microsoft in case you would like to vote it up.

1 best response

Accepted Solutions
best response confirmed by davidleal (Iron Contributor)
Solution

@davidleal 

Forgive me for asking, but have I got the wrong end of the stick here?

image.png

= IF(Combinationsλ(n,m), "●","-")

Combinationsλ
= LET(
    k, 2 ^ SEQUENCE(1, n, 0),
    h, SEQUENCE(2 ^ n),
    p, SIGN(BITAND(h, k)),
    f, BYROW(p, LAMBDA(p, SUM(p))) = m,
    FILTER(p, f)
  )

View solution in original post