Forum Discussion
Recursive LAMBDA implementation of Excel's REDUCE function.
I was vaguely aware of a recommendation for a lambda spotlight article 🙂 A modification for SCAN allowing for column vectors might be,
SCAN2=LAMBDA(initial_value, array, function,
IF(COLUMNS(array) = 1,
function(initial_value, array),
LET(
acc, SCAN2(initial_value, DROP(array, , -COLUMNS(array)/2), function),
HSTACK(acc,
SCAN2(TAKE(acc, , -1), TAKE(array, , -COLUMNS(array)/2), function)
)
)
)
)
Then the formula in the other LBROWN7 thread returns expected results (with NAs),
=SCAN2({1},{2,3,4},LAMBDA(a,b,VSTACK(a,b)))
I'd think requesting array versions of map and scan functions would be sufficient for a majority of use cases. Going with current trends XMAP and XSCAN might be sufficient if able to accommodate 2D input arrays in row major order.
An advantage of the anonymous recursion implementation is that unchanged variables may be moved outside the recursive definition as in the 'vec' definition in the original post. Additionally, perhaps we can remove the 'func' definition and just use function(array) in the first SergeiBaklan ByRow2 definition.
That could be great workaround. However, performance with SCAN is still not good. Tried poor thunking
SCAN3 = LAMBDA(initial_value, array, function,
LET(
sc, SCAN(LAMBDA(initial_value), array, LAMBDA(a, v, LAMBDA(function(a(), v)))),
n, COLUMNS(sc),
res, LAMBDA(ME, k,
IF( k = 1,
INDEX(sc, 1, 1)(),
HSTACK(ME(ME, k - 1), INDEX(sc, 1, k)() ))),
res(res, n)
)
)
but that gives nothing.
Native function is required. Not sure Microsoft will do it, it looks like all efforts now are concentrated on Python.
- LBROWN7Oct 13, 2023Brass Contributor
HI SergeiBaklan
I added an explicit stack parameter to the recursive scan and that seems to do the trick.=LAMBDA(initial_value, array, CLAMBDA, LET( _00, "Implementation of SCAN in Excel Lambda", _01, "accumulator must be a scalar or column vector" vec, TOROW(array), rec_L, LAMBDA(stack, acc, cindex, rec_LL, LET( cvalue, INDEX(vec, 0, cindex), new_acc, CLAMBDA(acc, cvalue), new_stack, HSTACK(stack, new_acc), IF( cindex = COLUMNS(vec), new_stack, rec_LL(new_stack, new_acc, cindex + 1, rec_LL) ) ) ), tresult, (rec_L(initial_value, initial_value, 1, rec_L)), MAP(tresult, LAMBDA(v, IF(ISERROR(v), "", v))) ) )({1}, {2, 3, 4}, LAMBDA(a, b, VSTACK(a, b)))
Probably need to add code to drop the 1st column - but above seems to be moving in the right direction.1 1 1 1 2 2 2 3 3 4 - LBROWN7Oct 14, 2023Brass Contributor
Here is a correct version of the above, with the correct stack initialization. I think this is a valid scan function.
=LAMBDA(initial_value, array, CLAMBDA, LET( _00, "Implementation of SCAN in Excel Lambda", vec, TOROW(array), rec_L, LAMBDA(stack, acc, cindex, rec_LL, LET( cvalue, INDEX(vec, 0, cindex), new_acc, CLAMBDA(acc, cvalue), new_stack, IF(cindex = 1, new_acc, HSTACK(stack, new_acc)), IF(cindex = COLUMNS(vec), new_stack, rec_LL(new_stack, new_acc, cindex + 1, rec_LL)) ) ), tresult, (rec_L("", initial_value, 1, rec_L)), MAP(tresult, LAMBDA(v, IF(ISERROR(v), "", v))) ) )({1, 2}, {2, 3, 4}, LAMBDA(a, b, VSTACK(a, b)))
- lori_mOct 13, 2023Steel Contributor
I hope we do get array versions of lambda helper functions at some point - many people have been asking and these type of workarounds are not exactly intuitive. In some tests, I found the SCAN2 function does slow down but wasn't too bad - for example it took about a second for a 1000x1000 matrix,
=SCAN2(1,SEQUENCE(,1000,2),LAMBDA(a,b,VSTACK(a,b)))
For comparison, the Python equivalent works fine with nested arrays:
from itertools import accumulate pd.DataFrame(accumulate(range(2,1001), lambda a,i:a+[i],initial=[1])).T
In a Jupyter notebook this runs very fast (70ms) but within Excel there was a long delay retrieving data (around 20s). I have mixed feelings about python in Excel - and will mainly be sticking with notebooks for now. It's great to have the extra functionality at one's fingertips but I'd still choose a lambda solution if available being seamlessly bound with the spreadsheet structure: range references, grid calc, security, etc.
- SergeiBaklanOct 13, 2023MVP
Sorry, I was wrong. Added removing of #N/A to the formula and that eats all the time. Without it about the second for SCAN2.
Python in my case shows
CPU times: user 160 ms, sys: 1.81 ms, total: 162 ms Wall time: 209 ms
plus about 10 sec for the delivery.
Practically the same.
By the way, in one of my first exercises I unintentionally generated with numpy nested array an recognized that only with familiar #CALC! error returning it to the grid.
Agree, Python won't substitute lambdas. What I tried to say is that Excel team right now and in nearest future is more concentrated on Python (my impression), it'll be not enough money and time to fix nested arrays issue.
- lori_mOct 14, 2023Steel Contributor
I don't think we need full support for array of arrays, just maybe additional parameters to SCAN / BYROW / BYCOL to indicate array support. If set, the function would need to use hstack({a},{b}) in place of {{a},{b}} like in workaround formulas. The way these functions are implemented currently may require a fixed output array size allowing for efficient array processing (e.g. modifying in-place) so there could be a valid reason to add an optional parameter here.
By the way, python accumulate function uses the same convention as above for first column - I just needed to add a comma to second comment for LBROWN7 formula to parse. This worked ok on hundred records but hung for me on a thousand. Also I think IFNA(...,"") could be sufficient to deal with NA results - no need for MAP - or equivalently .fillna('') in python.