Recursive LAMBDA implementation of Excel's REDUCE function.

Brass Contributor

Not a question -- just sharing some interesting code.
Out of curiosity, I decided to develop an implementation of Excel's REDUCE function as a recursive Excel LAMDA function.  I was truly amazed how simple it was to write such a powerful function.

 

 

=LAMBDA(initial_value,array,CLAMBDA,
    LET(
        _00,"Implementation of REDUCE in Excel Lambda",
        vec, TOROW(array),
        rec_L, LAMBDA(acc,cindex,rec_LL,
               LET(
                  cvalue,  INDEX( vec,0,cindex ),
                  new_acc,  CLAMBDA( acc,  cvalue  ),
                   IF( cindex = COLUMNS(vec),
                            new_acc,
                            rec_LL(new_acc,cindex+1,rec_LL )
                   )
               )
                      ),
         rec_L( initial_value,1,rec_L)
        )
 )(0, {1,2,3}, LAMBDA(acc,cv, acc+cv))

 

 

 

Understanding the recursive implementation REDUCE, will allow one to create custom REDUCE functions should your need one.

Note -- the recursive lambda function "rec_L" above does not need to be externally named. 
The recursive reference to rec_L  occurs at "run time" by including the "rec_l" reference as a parameter to itself :    rec_L(initial_value, 1,rec_L)   (line 15)

 

18 Replies
Interesting. Of course using recursion may cause you to hit the recursion limit. Perhaps sooner than hitting limits with REDUCE?
What's that CLAMBDA in your formula?

HI@JKPieterse 


Re: your question what is CLAMBDA?

 

recursive LAMBDA PARAMETERS  are : [initial_value,  array,    CLAMBDA ]
recursive LAMBDA ARGUMENTS   are:  (0,                  {1,2,3},   LAMBDA(acc,cv, acc+cv) ]

 

Mapping PARAMETERS to ARGUMENTS, we get:
initial_value => 0
array           => {1,2,3}
CLAMBDA  => LAMBDA(acc,cv, acc+cv)

CLAMBDA is a LAMBDA function that tells REDUCE how to combine the previous accumulator with the current_value, to produce the new accumulator. In my example I am just summing [accumulator + current_value] which results in the sum of the inputted array.


Re: Recursion LIMITS

You are correct -- the maximum vector size I was able to use in the recursive function was about 2,700
while in the real Excel REDUCE I could easily use a vector size of 1 million
=REDUCE(0, SEQUENCE(1,1000000,1,1), LAMBDA(acc,cv, acc+cv))


Lastly:


this code demonstrates how one COULD write Excel's REDUCE in as a Recursive Excel LAMBDA function. It in no way implies one should write it this way.  However, if you need to do looping where the number of repetitions is unknown, you will likely need to use recursion.

Hope your questions were answered.

 

 

@LBROWN7 

Just in case, current recursion stake limit is 16384. It is calculated as number of iterations x number of operands. Still not sure how operands are counted. It looks like you have 6 of them.

As for the REDUCE have no idea about the limit. In your sample that's more limit of SEQUENCE by grid size (1 048 576). If less we could have #SPILL! in case of columns, but if more we have #VALUE! in any case.

As for the idea itself it's great to demonstrate lambda self-calling. That was really actual when we have no helper functions and AFE. With them I'm not sure what are the pros of using LAMBDA/ME technique.

 Ongelmia sisäänkirjaamisessa. Yritä myöhemmin uudelleen. Details
Hi Sergei:

Thanks for the info -- I was totally unaware that there was a way to calculate recursion limits ( aside from empirical testing)

Also, took me a while to figure out what you meant by LAMDA/ME. Going forward I will definately use ME for the recursive function name in the parameter list ( very descriptive)

As to your last comment regarding the utility of LAMDA/ME versus using helper functions. At this point the primary utility of LAMBDA/ME technique in this context is educational,

Thanks for your post.

@LBROWN7 

Interesting method using LET recursively like that - thanks for posting.
One way to work around stack limits could be to use a binary tree method:

REDUCE2 = LAMBDA(initial_value, array, function,
    IF(
        COLUMNS(array) = 1,
        function(initial_value, array),
        REDUCE2(
            REDUCE2(
                initial_value,
                TAKE(array, , COLUMNS(array) / 2),
                function
            ),
            DROP(array, , COLUMNS(array) / 2),
            function
        )
    )
)

 

For a 2D version of BYROW (faster than REDUCE/VSTACK alternative):

BYROW2 = LAMBDA(array, function,
    IF(
        ROWS(array) = 1,
        function(array),
        VSTACK(
            BYROW2(TAKE(array, ROWS(array) / 2), function),
            BYROW2(DROP(array, ROWS(array) / 2), function)
        )
    )
)


These are both based on examples shared by @PeterBartholomew1 

@LBROWN7 

Sorry, LAMBDA/ME is just the slang used at the beginning of lambdas in Excel. That's close to what you suggested. If take the latest of suggested by @lori_m functions and put it in the cell without given a name, that could be like

=LAMBDA(array,function,
    LET(
        byRow2, LAMBDA(ME,arr,func,
        IF( ROWS(arr) = 1, func(arr),
            VSTACK(
                ME( ME, TAKE(arr, ROWS(arr) / 2), func  ),
                ME( ME, DROP(arr, ROWS(arr) / 2), func ) )
        ) ),
        byRow2(byRow2, array, function) )
)(range, fn)

or if not to wrap with lambda which, IMHO, more logical for in-cell formula

=LET(
 byRow2, LAMBDA(ME,arr,func,
   IF( ROWS(arr) = 1,
       func(arr),
       VSTACK(
         ME( ME, TAKE(arr, ROWS(arr) / 2), func  ),
         ME( ME, DROP(arr, ROWS(arr) / 2), func ) )
   )
 ),
byRow2(byRow2, array, fn) )

@lori_m 

@OwenPrice posted a discussion as an article

(2) Excel LAMBDA Spotlight: Bisected Map with BMAPλ | LinkedIn

The focus there is to stack mapped arrays but with minor changes one can apply a closely-related function to a SCAN with array outputs.  All that is needed is to pass a term from the prior stack to initialize the following one.

 

Now as we can get around the unfortunate 'nested array' and 'array of array' issues with recursion, perhaps it is time to go back to Microsoft with the request that they implement such functionality as native formulas.  As I travel in the direction of building ever larger elements of any solution within a single Lambda function, hitting these limitations is becoming the norm rather than the exception.

 

 

 

@PeterBartholomew1 

I was vaguely aware of a recommendation for a lambda spotlight article 🙂 A modification for SCAN allowing for column vectors might be,

SCAN2=LAMBDA(initial_value, array, function,
    IF(COLUMNS(array) = 1,
        function(initial_value, array),
        LET(
            acc, SCAN2(initial_value, DROP(array, , -COLUMNS(array)/2), function),
            HSTACK(acc, 
                SCAN2(TAKE(acc, , -1), TAKE(array, , -COLUMNS(array)/2), function)
            )
        )
    )
)

Then the formula in the other @LBROWN7 thread returns expected results (with NAs),
=SCAN2({1},{2,3,4},LAMBDA(a,b,VSTACK(a,b)))


I'd think requesting array versions of map and scan functions would be sufficient for a majority of use cases. Going with current trends XMAP and XSCAN might be sufficient if able to accommodate 2D input arrays in row major order. 


An advantage of the anonymous recursion implementation is that unchanged variables may be moved outside the recursive definition as in the 'vec' definition in the original post. Additionally, perhaps we can remove the 'func' definition and just use function(array) in the first @SergeiBaklan  ByRow2 definition.

@lori_m 

That could be great workaround. However, performance with SCAN is still not good. Tried poor thunking

SCAN3 = LAMBDA(initial_value, array, function,
    LET(
        sc, SCAN(LAMBDA(initial_value), array, LAMBDA(a, v, LAMBDA(function(a(), v)))),
        n, COLUMNS(sc),
        res, LAMBDA(ME, k,
            IF( k = 1,
                INDEX(sc, 1, 1)(),
                HSTACK(ME(ME, k - 1), INDEX(sc, 1, k)() ))),
        res(res, n)
    )
)

but that gives nothing.

Native function is required. Not sure Microsoft will do it, it looks like all efforts now are concentrated on Python.

 

@SergeiBaklan

I hope we do get array versions of lambda helper functions at some point - many people have been asking and these type of workarounds are not exactly intuitive. In some tests, I found the SCAN2 function does slow down but wasn't too bad - for example it took about a second for a 1000x1000 matrix,

 

=SCAN2(1,SEQUENCE(,1000,2),LAMBDA(a,b,VSTACK(a,b)))

 

For comparison, the Python equivalent works fine with nested arrays:

 

from itertools import accumulate
pd.DataFrame(accumulate(range(2,1001), lambda a,i:a+[i],initial=[1])).T

 

In a Jupyter notebook this runs very fast (70ms) but within Excel there was a long delay retrieving data (around 20s). I have mixed feelings about python in Excel - and will mainly be sticking with notebooks for now. It's great to have the extra functionality at one's fingertips but I'd still choose a lambda solution if available being seamlessly bound with the spreadsheet structure: range references, grid calc, security,  etc.

 

@lori_m 

Sorry, I was wrong. Added removing of #N/A to the formula and that eats all the time. Without it about the second for SCAN2.

Python in my case shows

CPU times: user 160 ms, sys: 1.81 ms, total: 162 ms Wall time: 209 ms

plus about 10 sec for the delivery.

Practically the same.

By the way, in one of my first exercises I unintentionally generated with numpy nested array an recognized that only with familiar #CALC! error returning it to the grid.

Agree, Python won't substitute lambdas. What I tried to say is that Excel team right now and in nearest future is more concentrated on Python (my impression), it'll be not enough money and time to fix nested arrays issue.

HI @SergeiBaklan

I added an explicit stack parameter to the recursive scan and that seems to do the trick.

 

=LAMBDA(initial_value, array, CLAMBDA,
    LET(
        _00, "Implementation of SCAN in Excel Lambda",
        _01, "accumulator must be a scalar or  column vector"
        vec, TOROW(array),
        rec_L, LAMBDA(stack, acc, cindex, rec_LL,
            LET(
                cvalue, INDEX(vec, 0, cindex),
                new_acc, CLAMBDA(acc, cvalue),
                new_stack, HSTACK(stack, new_acc),
                IF(
                    cindex = COLUMNS(vec),
                    new_stack,
                    rec_LL(new_stack, new_acc, cindex + 1, rec_LL)
                )
            )
        ),
        tresult, (rec_L(initial_value, initial_value, 1, rec_L)),
        MAP(tresult, LAMBDA(v, IF(ISERROR(v), "", v)))
    )
)({1}, {2, 3, 4}, LAMBDA(a, b, VSTACK(a, b)))

 


Probably need to add code to drop the 1st column - but above seems to be moving in the right direction.

1111
 222
  33
   4



Here is a correct version of the above, with the correct stack initialization.  I think this is a valid scan function.

=LAMBDA(initial_value, array, CLAMBDA,
    LET(
        _00, "Implementation of SCAN in Excel Lambda",
        vec, TOROW(array),
        rec_L, LAMBDA(stack, acc, cindex, rec_LL,
            LET(
                cvalue, INDEX(vec, 0, cindex),
                new_acc, CLAMBDA(acc, cvalue),
                new_stack, IF(cindex = 1, new_acc, HSTACK(stack, new_acc)),
                IF(cindex = COLUMNS(vec), new_stack, rec_LL(new_stack, new_acc, cindex + 1, rec_LL))
            )
        ),
        tresult, (rec_L("", initial_value, 1, rec_L)),
        MAP(tresult, LAMBDA(v, IF(ISERROR(v), "", v)))
    )
)({1, 2}, {2, 3, 4}, LAMBDA(a, b, VSTACK(a, b)))

@SergeiBaklan 

I don't think we need full support for array of arrays, just maybe additional parameters to SCAN / BYROW / BYCOL to indicate array support. If set, the function would need to use hstack({a},{b}) in place of {{a},{b}} like in workaround formulas. The way these functions are implemented currently may require a fixed output array size allowing for efficient array processing (e.g. modifying in-place) so there could be a valid reason to add an optional parameter here.

 

By the way, python accumulate function uses the same convention as above for first column - I just needed to add a comma to second comment for @LBROWN7 formula to parse. This worked ok on hundred records but hung for me on a thousand. Also I think IFNA(...,"") could be sufficient to deal with NA results - no need for MAP - or equivalently .fillna('') in python.

@lori_m 

If function by function - maybe. If Microsoft goes this way most probably new functions like SCAN.EXT will be introduced with usual deployment cycle from Beta to production.

Performance is the key here. So far only @PeterBartholomew1 approach gives more or less acceptable results on large arrays. As for IFNA() my impression it reduces performance dramatically. MAP() even more.

@SergeiBaklan 

I had in mind wrapping the result with IFNA(SCAN2(...),"") rather than inserting extra conditions into the recursive definition which would likely have a considerable impact to performance. Using the following recursive let implementation also gave the same results as the @LBROWN7 second formula

=LAMBDA(initial_value, array, function,
    LET(
        ret, LAMBDA(fn, initial_value, array,
            IF(
                COLUMNS(array) = 1,
                function(initial_value, array),
                LET(
                    acc, fn(fn, initial_value, DROP(array, , -COLUMNS(array) / 2)),
                    HSTACK(
                        acc,
                        fn(fn, TAKE(acc, , -COLUMNS(initial_value)), TAKE(array, , -COLUMNS(array) / 2))
                    )
                )
            )
        ),
        IFNA(ret(ret, initial_value, array), "")
    )
)({1, 2}, {2, 3, 4}, LAMBDA(a, b, VSTACK(a, b)))

P.S. I found this useful for the python code: Anonymous recursion - Wikipedia

 

 

@lori_m 

Great, it practically doesn't affect performance compare to initial SCAN2.

Thank you for the link, make the bookmark to read more carefully.