Cumulative Sum of Each Column or Row

After studying various posts (answers) by members of this community, I developed a function that returns the cumulative sum of each column or row of an array:

=LAMBDA(a,[by_row],LET(
    f,IF(by_row,
        LAMBDA(b,IF(ROWS(b)=1,b,
            LAMBDA(b-VSTACK(0,DROP(TAKE(b,,-1),-1)))())),
        LAMBDA(b,IF(COLUMNS(b)=1,b,
            LAMBDA(b-HSTACK(0,DROP(TAKE(b,-1),,-1)))()))),
    IF(by_row,
        LAMBDA(f(SCAN(0,a,SUM)))(),
        LAMBDA(f(TRANSPOSE(SCAN(0,TRANSPOSE(a),SUM))))())))

My goal is maximum efficiency. I am new to the concept of lazy evaluation, so I'm wondering if you could explain the flow in detail and whether there is a pair of sets of parentheses too much. Of course, I'm open to improvements.

excel

Formulas and Functions

djclements
Jan 20, 2026
While I can't speak on the technical aspects of the concept of lazy evaluation, I do feel there is a time and place for it (and this doesn't appear to be one of those times). In a recent discussion, The Diagonal Suite: Gentle thunking goes a long way!, it was suggested that using thunks to delay eager evaluation whenever possible in generalized Lambda development will always result in quicker calculation times. However, a simple Timer test shows this to not always be the case.
When testing your function as written with a large array, SEQUENCE(1000000,10), the results on my system were 3,500ms by_col and 3,000ms by_row on average (tested multiple times). I then modified the function definition by removing all 4 instances of LAMBDA(...)() and the average calculation times dropped to 2,500ms by_col and 2,000ms by_row. The formula I used to conduct the test was:
=Timer(A1,LAMBDA(x,CumSum(SEQUENCE(1000000,10),x)))
...where A1 was a checkbox to toggle the by_row argument on and off.
Aside from that, it's also possible to simplify the function definition by modifying the sub-function with additional parameters to swap ROWS with COLUMNS and VSTACK with HSTACK (as well as the rows and [columns] arguments of TAKE and DROP), which will eliminate having to check IF(by_row,...) twice:
= LAMBDA(array,[by_row], LET( fn, LAMBDA(a,f₁,f₂,i,[j],IF(f₁(a)=1,a,a-f₂(0,DROP(TAKE(a,i,j),j,i)))), IF( by_row, fn(SCAN(0,array,SUM),ROWS,VSTACK,,-1), fn(TRANSPOSE(SCAN(0,TRANSPOSE(array),SUM)),COLUMNS,HSTACK,-1) ) ) )
Furthermore, the check for a single row or column can also be eliminated by rearranging the order of VSTACK/HSTACK and DROP (stack first to prevent DROP from erroring):
=LAMBDA(array,[by_row], LET( fn, LAMBDA(a,fx,i,[j],a-DROP(fx(0,TAKE(a,i,j)),j,i)), IF( by_row, fn(SCAN(0,array,SUM),VSTACK,,-1), fn(TRANSPOSE(SCAN(0,TRANSPOSE(array),SUM)),HSTACK,-1) ) ) )
This also shaved an additional 200ms off the average calculation times in my tests.
These are just my humble observations/suggestions. Your function is already very efficient as is. ;)
Kind regards.

8 Replies

djclements
Silver Contributor
Jan 20, 2026
While I can't speak on the technical aspects of the concept of lazy evaluation, I do feel there is a time and place for it (and this doesn't appear to be one of those times). In a recent discussion, The Diagonal Suite: Gentle thunking goes a long way!, it was suggested that using thunks to delay eager evaluation whenever possible in generalized Lambda development will always result in quicker calculation times. However, a simple Timer test shows this to not always be the case.
When testing your function as written with a large array, SEQUENCE(1000000,10), the results on my system were 3,500ms by_col and 3,000ms by_row on average (tested multiple times). I then modified the function definition by removing all 4 instances of LAMBDA(...)() and the average calculation times dropped to 2,500ms by_col and 2,000ms by_row. The formula I used to conduct the test was:
=Timer(A1,LAMBDA(x,CumSum(SEQUENCE(1000000,10),x)))
...where A1 was a checkbox to toggle the by_row argument on and off.
Aside from that, it's also possible to simplify the function definition by modifying the sub-function with additional parameters to swap ROWS with COLUMNS and VSTACK with HSTACK (as well as the rows and [columns] arguments of TAKE and DROP), which will eliminate having to check IF(by_row,...) twice:
= LAMBDA(array,[by_row], LET( fn, LAMBDA(a,f₁,f₂,i,[j],IF(f₁(a)=1,a,a-f₂(0,DROP(TAKE(a,i,j),j,i)))), IF( by_row, fn(SCAN(0,array,SUM),ROWS,VSTACK,,-1), fn(TRANSPOSE(SCAN(0,TRANSPOSE(array),SUM)),COLUMNS,HSTACK,-1) ) ) )
Furthermore, the check for a single row or column can also be eliminated by rearranging the order of VSTACK/HSTACK and DROP (stack first to prevent DROP from erroring):
=LAMBDA(array,[by_row], LET( fn, LAMBDA(a,fx,i,[j],a-DROP(fx(0,TAKE(a,i,j)),j,i)), IF( by_row, fn(SCAN(0,array,SUM),VSTACK,,-1), fn(TRANSPOSE(SCAN(0,TRANSPOSE(array),SUM)),HSTACK,-1) ) ) )
This also shaved an additional 200ms off the average calculation times in my tests.
These are just my humble observations/suggestions. Your function is already very efficient as is. ;)
Kind regards.
- Patrick2788
  Silver Contributor
  Jan 21, 2026
  Re: Gentle thunking
  Delaying eager evaluation of indices is often advantageous because it gives you back the calculation cost of caching calculations with LET. An example of this is the Ulam spiral:
  Spiralλ creates a clockwise Ulam spiral by specifying the number of rings and an optional starting integer.
  Thunking the result of Lambda helpers such as BYROW/BYCOL, MAP, REDUCE, etc. usually has no benefit or slows calculations. Additionally, delaying eager evaluation of stacking functions is usually not a good idea!
  I pick and choose my spots for using thunks and thoroughly time no thunks vs. thunks before finalizing a function.
  - djclements
    Silver Contributor
    Jan 21, 2026
    Patrick2788
    Good to know! Your Spiralλ function is definitely proof of concept... I tested it as written, with all calculations delayed until delivery vs having all variables evaluated as they are encountered, and it was notably faster, which is surprising to say the least, considering the sheer number of times both x() and y() are being re-evaluated (which in turn causes j() and i() to also re-evaluate).
    A simple test to prove delayed calculations are re-evaluated each time they are called is:
    =LET(f,LAMBDA(RAND()),VSTACK(f(),f()))
    which returns 2 different random numbers vs:
    =LET(f,LAMBDA(x,LAMBDA(x))(RAND()),VSTACK(f(),f()))
    which returns 2 identical random numbers.
    In the limited research I did on this topic, I read that with eager evaluation, resources are consumed up front (memory and CPU time), which can be resource-intensive if the results are not needed. This is the confusing part, because every variable in your Spiralλ function is needed and used (with the exception of d_). The underlying reasoning for the performance boost is beyond my understanding, lol. Good on you for discovering this method! ;)
- VBasic2008
  Brass Contributor
  Jan 21, 2026
  It appears you made the function roughly 30% faster (it was 45% slower). Thanks a lot, and thanks for the link that provided me the means (formula) to confirm your findings.
  
  For the fun of it, here's a naughty version of your last formula:
  
  =LAMBDA(a,[by_row],LET( f,LAMBDA(a,f,i,[j],a-DROP(f(0,TAKE(a,i,j)),j,i)), IF(by_row,f(SCAN(0,a,SUM),VSTACK,,-1), f(TRANSPOSE(SCAN(0,TRANSPOSE(a),SUM)),HSTACK,-1))))
  
  I only recently discovered that one could get away with that.
  - djclements
    Silver Contributor
    Jan 21, 2026
    LOL, naughty, as in recycling the same variable names, a and f, in 2 locations (especially f, which defines the sub-function name, as well as one of its arguments)? Yes, this can be done, although I typically avoid it for clarity and debugging purposes. Even recursive functions defined within LET can use the same variable for the function name, as well as its combinator, e.g. =LET(fn,LAMBDA(fn,arg,... vs =LET(fn,LAMBDA(me,arg,... but I find the latter easier to read.
    Glad to have helped. Cheers!
SnowMan55
Bronze Contributor
Jan 20, 2026
My goal is maximum efficiency.
I have no comments regarding your code (I don't want to dig into that tonight), but want to make you aware of how you can determine calculation performance yourself in Excel for Windows. High-precision timing can be done using VBA macros and other procedures. (And those procedures can be placed in your Personal.xlsb file, so the workbook you are measuring performance in does not need to be macro-enabled.) See Excel performance - Improving calculation performance , specifically, the "Measuring calculation time" section. But also read Speed of Functions vs. other Functions vs. VBA Macro

Forum Discussion

Cumulative Sum of Each Column or Row

8 Replies

Resources