Forum Discussion

Brass Contributor

Jul 27, 2024

Solved

Generating a 2D array from sumifs on the columns of another 2D array

I have a 2D array of costs by month and want a sumif equivalent based on another dynamic array of Divisions see below. i.e. Jan Div 1 is in M2 so I can refer to this as M2# to get the full table Jan ...

excel

Formulas and Functions

djclements
Jul 28, 2024
James_Buist As a variant, here's a couple of SUMIFS options:

=LET( arr, LAMBDA(n, CHOOSE(n, K10#, SEQUENCE(, COLUMNS(M2#)))), MAP(arr({1}), arr({2}), LAMBDA(r,c, SUMIFS(INDEX(M2#,, c), K2#, r))) )

-OR-

=DROP(REDUCE(0, SEQUENCE(COLUMNS(M2#)), LAMBDA(p,c, HSTACK(p, SUMIFS(INDEX(M2#,, c), K2#, K10#)))),, 1)

The latter should be more efficient because there are fewer iterations, but you probably won't notice a difference on a relatively small dataset.

Having said that, I'd be more interested to see how the actual data source is organized. Obviously, cells K2 and M2 contain formulas that reference another dataset, because you're using the spilled range operator (#). If the data source was arranged vertically in a table with columns for "Division", "Month" and "Amount", you could easily summarize the data using a single SUMIFS formula or a pivot table.

Please see the attached example file, if needed...
2D_SUMIFS_Examples.xlsx11 KB

James_Buist

Brass Contributor

Jul 28, 2024

I tested the three options with 5000 rows and 180 columns. The MAKEARRAY solution took around 5 seconds to calculate / process. ie clicking in the cell with a SPILL error to having a display of the results was around 5s and similar time for a full alt ctrl F9 calc. In addition, it would need additional handling for non matches ie if a division has no costs as these result in a calc error

On the other hand, it is impossible to tell what the performance difference is between REDUCE and the LET with MAP as they are both lightning fast. Clicking in the cell with a spill error and tabbing out results in immediate display of results and with a full calc (I'm using manual calc) the calc flashes up then goes. Think I'd need to test with 100K rows to get a real gauge.
So I will stick with the LET & MAP which remain the nest solutions. either.

djclements

Bronze Contributor

Jul 29, 2024

James_Buist MMULT is also a possibility:

=MMULT(--(K10# = TOROW(K2#)), M2#)

This is both the shortest and most efficient method for your layout. However, since you "need the simplest possible solution that others later may understand", it may or may not be the way to go, as it's not the easiest method to understand and/or modified to meet new criteria.

Regarding the MAP / CHOOSE method, this was offered as a relatively efficient alternative to the MAKEARRAY / INDEX method, but not necessarily the most efficient method overall. As with most situations in Excel, there are many different ways to get the job done. Ultimately, it comes down to the method you prefer, while maintaining efficiency relative to the size of the dataset you're working with.

To demonstrate the difference in calculation speed between MAP / CHOOSE and MAKEARRAY / INDEX, try the following comparison test:

=LET(
    arr, LAMBDA(n, CHOOSE(n, SEQUENCE(1000,,,2), SEQUENCE(,1000,,5))),
    MAP(arr({1}), arr({2}), LAMBDA(r,c, r+c))
)

\\compared to:

=LET(
    rs, SEQUENCE(1000,,,2),
    cs, SEQUENCE(,1000,,5),
    MAKEARRAY(1000, 1000, LAMBDA(r,c, INDEX(rs,r)+INDEX(cs,c)))
)

On my system, MAP took only 1 second to return the results, whereas MAKEARRAY took almost 30.

(Obviously =SEQUENCE(1000,,,2)+SEQUENCE(,1000,,5) is all that's needed here, but that would defeat the purpose of the test, lol.)

Cheers!