Oct 09 2023 12:44 PM - edited Oct 09 2023 01:02 PM
Not a question -- just sharing some interesting code.
Out of curiosity, I decided to develop an implementation of Excel's REDUCE function as a recursive Excel LAMDA function. I was truly amazed how simple it was to write such a powerful function.
=LAMBDA(initial_value,array,CLAMBDA,
LET(
_00,"Implementation of REDUCE in Excel Lambda",
vec, TOROW(array),
rec_L, LAMBDA(acc,cindex,rec_LL,
LET(
cvalue, INDEX( vec,0,cindex ),
new_acc, CLAMBDA( acc, cvalue ),
IF( cindex = COLUMNS(vec),
new_acc,
rec_LL(new_acc,cindex+1,rec_LL )
)
)
),
rec_L( initial_value,1,rec_L)
)
)(0, {1,2,3}, LAMBDA(acc,cv, acc+cv))
Understanding the recursive implementation REDUCE, will allow one to create custom REDUCE functions should your need one.
Note -- the recursive lambda function "rec_L" above does not need to be externally named.
The recursive reference to rec_L occurs at "run time" by including the "rec_l" reference as a parameter to itself : rec_L(initial_value, 1,rec_L) (line 15)
Oct 10 2023 05:19 AM
Oct 10 2023 08:10 AM
Re: your question what is CLAMBDA?
recursive LAMBDA PARAMETERS are : [initial_value, array, CLAMBDA ]
recursive LAMBDA ARGUMENTS are: (0, {1,2,3}, LAMBDA(acc,cv, acc+cv) ]
Mapping PARAMETERS to ARGUMENTS, we get:
initial_value => 0
array => {1,2,3}
CLAMBDA => LAMBDA(acc,cv, acc+cv)
CLAMBDA is a LAMBDA function that tells REDUCE how to combine the previous accumulator with the current_value, to produce the new accumulator. In my example I am just summing [accumulator + current_value] which results in the sum of the inputted array.
Re: Recursion LIMITS
You are correct -- the maximum vector size I was able to use in the recursive function was about 2,700
while in the real Excel REDUCE I could easily use a vector size of 1 million
=REDUCE(0, SEQUENCE(1,1000000,1,1), LAMBDA(acc,cv, acc+cv))
Lastly:
this code demonstrates how one COULD write Excel's REDUCE in as a Recursive Excel LAMBDA function. It in no way implies one should write it this way. However, if you need to do looping where the number of repetitions is unknown, you will likely need to use recursion.
Hope your questions were answered.
Oct 10 2023 09:21 AM
Just in case, current recursion stake limit is 16384. It is calculated as number of iterations x number of operands. Still not sure how operands are counted. It looks like you have 6 of them.
As for the REDUCE have no idea about the limit. In your sample that's more limit of SEQUENCE by grid size (1 048 576). If less we could have #SPILL! in case of columns, but if more we have #VALUE! in any case.
As for the idea itself it's great to demonstrate lambda self-calling. That was really actual when we have no helper functions and AFE. With them I'm not sure what are the pros of using LAMBDA/ME technique.
Oct 10 2023 09:22 AM
Oct 10 2023 01:08 PM
Oct 12 2023 03:24 AM
Interesting method using LET recursively like that - thanks for posting.
One way to work around stack limits could be to use a binary tree method:
REDUCE2 = LAMBDA(initial_value, array, function,
IF(
COLUMNS(array) = 1,
function(initial_value, array),
REDUCE2(
REDUCE2(
initial_value,
TAKE(array, , COLUMNS(array) / 2),
function
),
DROP(array, , COLUMNS(array) / 2),
function
)
)
)
For a 2D version of BYROW (faster than REDUCE/VSTACK alternative):
BYROW2 = LAMBDA(array, function,
IF(
ROWS(array) = 1,
function(array),
VSTACK(
BYROW2(TAKE(array, ROWS(array) / 2), function),
BYROW2(DROP(array, ROWS(array) / 2), function)
)
)
)
These are both based on examples shared by @PeterBartholomew1
Oct 12 2023 06:47 AM
Sorry, LAMBDA/ME is just the slang used at the beginning of lambdas in Excel. That's close to what you suggested. If take the latest of suggested by @lori_m functions and put it in the cell without given a name, that could be like
=LAMBDA(array,function,
LET(
byRow2, LAMBDA(ME,arr,func,
IF( ROWS(arr) = 1, func(arr),
VSTACK(
ME( ME, TAKE(arr, ROWS(arr) / 2), func ),
ME( ME, DROP(arr, ROWS(arr) / 2), func ) )
) ),
byRow2(byRow2, array, function) )
)(range, fn)
or if not to wrap with lambda which, IMHO, more logical for in-cell formula
=LET(
byRow2, LAMBDA(ME,arr,func,
IF( ROWS(arr) = 1,
func(arr),
VSTACK(
ME( ME, TAKE(arr, ROWS(arr) / 2), func ),
ME( ME, DROP(arr, ROWS(arr) / 2), func ) )
)
),
byRow2(byRow2, array, fn) )
Oct 12 2023 06:47 AM
@OwenPrice posted a discussion as an article
(2) Excel LAMBDA Spotlight: Bisected Map with BMAPλ | LinkedIn
The focus there is to stack mapped arrays but with minor changes one can apply a closely-related function to a SCAN with array outputs. All that is needed is to pass a term from the prior stack to initialize the following one.
Now as we can get around the unfortunate 'nested array' and 'array of array' issues with recursion, perhaps it is time to go back to Microsoft with the request that they implement such functionality as native formulas. As I travel in the direction of building ever larger elements of any solution within a single Lambda function, hitting these limitations is becoming the norm rather than the exception.
Oct 12 2023 01:37 PM
I was vaguely aware of a recommendation for a lambda spotlight article 🙂 A modification for SCAN allowing for column vectors might be,
SCAN2=LAMBDA(initial_value, array, function,
IF(COLUMNS(array) = 1,
function(initial_value, array),
LET(
acc, SCAN2(initial_value, DROP(array, , -COLUMNS(array)/2), function),
HSTACK(acc,
SCAN2(TAKE(acc, , -1), TAKE(array, , -COLUMNS(array)/2), function)
)
)
)
)
Then the formula in the other @LBROWN7 thread returns expected results (with NAs),
=SCAN2({1},{2,3,4},LAMBDA(a,b,VSTACK(a,b)))
I'd think requesting array versions of map and scan functions would be sufficient for a majority of use cases. Going with current trends XMAP and XSCAN might be sufficient if able to accommodate 2D input arrays in row major order.
An advantage of the anonymous recursion implementation is that unchanged variables may be moved outside the recursive definition as in the 'vec' definition in the original post. Additionally, perhaps we can remove the 'func' definition and just use function(array) in the first @SergeiBaklan ByRow2 definition.
Oct 13 2023 08:12 AM
That could be great workaround. However, performance with SCAN is still not good. Tried poor thunking
SCAN3 = LAMBDA(initial_value, array, function,
LET(
sc, SCAN(LAMBDA(initial_value), array, LAMBDA(a, v, LAMBDA(function(a(), v)))),
n, COLUMNS(sc),
res, LAMBDA(ME, k,
IF( k = 1,
INDEX(sc, 1, 1)(),
HSTACK(ME(ME, k - 1), INDEX(sc, 1, k)() ))),
res(res, n)
)
)
but that gives nothing.
Native function is required. Not sure Microsoft will do it, it looks like all efforts now are concentrated on Python.
Oct 13 2023 01:35 PM - edited Oct 13 2023 01:40 PM
I hope we do get array versions of lambda helper functions at some point - many people have been asking and these type of workarounds are not exactly intuitive. In some tests, I found the SCAN2 function does slow down but wasn't too bad - for example it took about a second for a 1000x1000 matrix,
=SCAN2(1,SEQUENCE(,1000,2),LAMBDA(a,b,VSTACK(a,b)))
For comparison, the Python equivalent works fine with nested arrays:
from itertools import accumulate
pd.DataFrame(accumulate(range(2,1001), lambda a,i:a+[i],initial=[1])).T
In a Jupyter notebook this runs very fast (70ms) but within Excel there was a long delay retrieving data (around 20s). I have mixed feelings about python in Excel - and will mainly be sticking with notebooks for now. It's great to have the extra functionality at one's fingertips but I'd still choose a lambda solution if available being seamlessly bound with the spreadsheet structure: range references, grid calc, security, etc.
Oct 13 2023 02:25 PM
Sorry, I was wrong. Added removing of #N/A to the formula and that eats all the time. Without it about the second for SCAN2.
Python in my case shows
CPU times: user 160 ms, sys: 1.81 ms, total: 162 ms Wall time: 209 ms
plus about 10 sec for the delivery.
Practically the same.
By the way, in one of my first exercises I unintentionally generated with numpy nested array an recognized that only with familiar #CALC! error returning it to the grid.
Agree, Python won't substitute lambdas. What I tried to say is that Excel team right now and in nearest future is more concentrated on Python (my impression), it'll be not enough money and time to fix nested arrays issue.
Oct 13 2023 03:14 PM
HI @SergeiBaklan
I added an explicit stack parameter to the recursive scan and that seems to do the trick.
=LAMBDA(initial_value, array, CLAMBDA,
LET(
_00, "Implementation of SCAN in Excel Lambda",
_01, "accumulator must be a scalar or column vector"
vec, TOROW(array),
rec_L, LAMBDA(stack, acc, cindex, rec_LL,
LET(
cvalue, INDEX(vec, 0, cindex),
new_acc, CLAMBDA(acc, cvalue),
new_stack, HSTACK(stack, new_acc),
IF(
cindex = COLUMNS(vec),
new_stack,
rec_LL(new_stack, new_acc, cindex + 1, rec_LL)
)
)
),
tresult, (rec_L(initial_value, initial_value, 1, rec_L)),
MAP(tresult, LAMBDA(v, IF(ISERROR(v), "", v)))
)
)({1}, {2, 3, 4}, LAMBDA(a, b, VSTACK(a, b)))
Probably need to add code to drop the 1st column - but above seems to be moving in the right direction.
1 | 1 | 1 | 1 |
2 | 2 | 2 | |
3 | 3 | ||
4 |
Oct 13 2023 10:53 PM
Here is a correct version of the above, with the correct stack initialization. I think this is a valid scan function.
=LAMBDA(initial_value, array, CLAMBDA,
LET(
_00, "Implementation of SCAN in Excel Lambda",
vec, TOROW(array),
rec_L, LAMBDA(stack, acc, cindex, rec_LL,
LET(
cvalue, INDEX(vec, 0, cindex),
new_acc, CLAMBDA(acc, cvalue),
new_stack, IF(cindex = 1, new_acc, HSTACK(stack, new_acc)),
IF(cindex = COLUMNS(vec), new_stack, rec_LL(new_stack, new_acc, cindex + 1, rec_LL))
)
),
tresult, (rec_L("", initial_value, 1, rec_L)),
MAP(tresult, LAMBDA(v, IF(ISERROR(v), "", v)))
)
)({1, 2}, {2, 3, 4}, LAMBDA(a, b, VSTACK(a, b)))
Oct 14 2023 01:17 AM
I don't think we need full support for array of arrays, just maybe additional parameters to SCAN / BYROW / BYCOL to indicate array support. If set, the function would need to use hstack({a},{b}) in place of {{a},{b}} like in workaround formulas. The way these functions are implemented currently may require a fixed output array size allowing for efficient array processing (e.g. modifying in-place) so there could be a valid reason to add an optional parameter here.
By the way, python accumulate function uses the same convention as above for first column - I just needed to add a comma to second comment for @LBROWN7 formula to parse. This worked ok on hundred records but hung for me on a thousand. Also I think IFNA(...,"") could be sufficient to deal with NA results - no need for MAP - or equivalently .fillna('') in python.
Oct 14 2023 07:59 AM
If function by function - maybe. If Microsoft goes this way most probably new functions like SCAN.EXT will be introduced with usual deployment cycle from Beta to production.
Performance is the key here. So far only @PeterBartholomew1 approach gives more or less acceptable results on large arrays. As for IFNA() my impression it reduces performance dramatically. MAP() even more.
Oct 16 2023 07:06 AM
I had in mind wrapping the result with IFNA(SCAN2(...),"") rather than inserting extra conditions into the recursive definition which would likely have a considerable impact to performance. Using the following recursive let implementation also gave the same results as the @LBROWN7 second formula
=LAMBDA(initial_value, array, function,
LET(
ret, LAMBDA(fn, initial_value, array,
IF(
COLUMNS(array) = 1,
function(initial_value, array),
LET(
acc, fn(fn, initial_value, DROP(array, , -COLUMNS(array) / 2)),
HSTACK(
acc,
fn(fn, TAKE(acc, , -COLUMNS(initial_value)), TAKE(array, , -COLUMNS(array) / 2))
)
)
)
),
IFNA(ret(ret, initial_value, array), "")
)
)({1, 2}, {2, 3, 4}, LAMBDA(a, b, VSTACK(a, b)))
P.S. I found this useful for the python code: Anonymous recursion - Wikipedia
Oct 16 2023 03:32 PM
Great, it practically doesn't affect performance compare to initial SCAN2.
Thank you for the link, make the bookmark to read more carefully.