A generalised Lambda helper function that return arrays of arrays using bisection.

Silver Contributor

 

Specifically this required days of the week to be sorted by multiple price criteria.

[This is intended as a topic for discussion as opposed to a specific request for help]

 

I used the problem to develop a Lambda function that uses a binary tree to select the rows at the leaf nodes and perform a straightforward sort.  The results then get stacked pairwise until the blue table of results is returned

image.png

 

 

BYROWλ
"Applies a user-defined function by row and collects array results into a stacked array"
=LET(resultϑ, BYROW(array, ThunkFnλ), BinaryTreeλ(ROWS(resultϑ), "", Derefϑarrλ))

 

 

The main bisection recursion is hidden from the user

 

 

BinaryTreeλ
"Generates binary numbers stacking blocks pairwise to an upper limit"
=LET(
    maxNode, BASE(nodeCount - 1, 2),
    maxLevel, LEN(maxNode),
    level, LEN(parentNode),
    maxChild, LEFT(maxNode, 1 + level),
    IF(
        level < maxLevel - 1,
        LET(
            childNode0, BinaryTreeλ(nodeCount, parentNode & 0, FNλ),
            childNode1, BinaryTreeλ(nodeCount, parentNode & 1, FNλ),
            IF((parentNode & 0) = maxChild, childNode0, VSTACK(childNode0, childNode1))
        ),
        IF(
            (parentNode & 0) = maxChild,
            FNλ(parentNode & 0),
            VSTACK(FNλ(parentNode & 0), FNλ(parentNode & 1))
        )
    )
)

 

 

but it is intended to non-problem specific and hence reusable.

A key feature is that the user provides a Lambda function that performs the required calculation as if BYROW would return the result, but the function is converted to one the returns a thunk and hence avoids the array of arrays problem.

 

 

ThunkFnλ
"Accepts a user-defined Lambda function that returns an array result and generates a related function that returns the corresponding thunk"
=LAMBDA(arr, LAMBDA(userFnλ(arr)))

 

 

Finally

 

 

Derefϑarrλ
"Dereferences a term from a thunk array using a zero-based binary pointer and expands the resulting scalar thunk"
=INDEX(resultϑ, DECIMAL(ptr, 2) + 1, 1)()

 

 

converts the binary node pointer to a decimal index.

 

This is all crazily heavy but, hopefully, would be simple to reuse for other problems since the main processing is problem independent.

14 Replies

I must be dense (or my brain may have been scrambled by trying to wrap itself around the concept of thunks for so long) but let ask a simple question. You say:


"it is intended to non-problem specific and hence reusable. ...the user provides a Lambda function that performs the required calculation as if BYROW would return the result.."

 

So let's say I have range of three one-column rows each containing a first and last name, comma separated ("Elvis,Presley", for example).
Trying

 

=BYROW(range,LAMBDA(row,TEXTSPLIT(row,",")))

 

results in your classic #CALC! and the admonition that "nested arrays are not supported".

 

What do I need to modify in your attached workbook so as to get the desired output (a 2X3 array)?

 

As you mention, this is not a specific request for help. In an ideal world (short of Excel resolving the array-of-arrays issue natively), I would have a black box into which I would feed the relevant array, the attack approach (by row, by column), the function to apply (TEXTSPLIT(), in this example) and the argument(s) required by that specific function (a delimiter, in this example) - and have the box spit out the correct outcome; something like:

 

=blackbox(range,byrow,textsplit,",")

 

 

One can only hope, right?

@TheDub 

You might well ask!  As things stand I believe there is an error in the calculation that means it only works correctly when applied to the original test data.  What was intended was that the user would supply a new Lambda function that accepts one row of their data and produces the result they require of it, be it a scalar (which works with the inbuilt helper function BYROW), a row array or, even, a 2D array.

 

It seems, however that, while I was deriving the algorithm for building the binary tree and 'pruning' branches that would have allowed the indices go beyond the array size and led to #REF! errors , I had introduced a global variable 'resultϑ' to provide the thunk array as test input.  Although I now calculate the thunk array within the formula, I suspect that variable is going out of scope during the recursion and the formula is defaulting to the workbook-scoped version and producing a hybrid mess of results.

 

Back to the drawing board!  I will let you know what happens.

@TheDub 

I think this does it!  I have removed the array of thunks from name manager and passed it as a parameter.

 

I have added an example that arose in a LinkedIn discussion from Marco Filocamo

(21) Post | Feed | LinkedIn

as well as your example that called for a similar text split.

image.png

The 'major' change for the user is to append the Greek letter λ to BYROW so that the formula picks up my function rather than the in-built helper function.

I still have to digest the "behind the scenes" a bit, but to paraphrase My Fair Lady: "By George! I think you've got it!"

@Peter Bartholomew Impressive work! I've been reading your posts/discussions regarding Thunks, but haven't taken the time to dive in and try to understand them yet. Seems to be full of potential...

 

I was hoping this method might somehow bypass the limits placed on lambda recursion, but alas, it seems to only work well with a few hundred rows of data or less. I had experimented with recursive lambdas a few months ago and came up with a custom STACKBYROW function that achieved similar results; however, it too was only useful with smaller data sets:

 

STACKBYROW:
=LAMBDA(array,function,[initial_value],[start],[pad_with],
    LET(
        n, MAX(start, 1),
        f, function(INDEX(array, n, 0)),
        v, IF(ISOMITTED(initial_value), f, IFNA(VSTACK(initial_value, f), pad_with)),
        IF(n<ROWS(array), STACKBYROW(array, function, v, n+1, pad_with), v)
    )
)

 

stackbyrow_lambda.png

 

When I finally get some time to take look at Thunks, I'll definitely be using this as a basis for understanding how they work. Thanks!

@djclements 

If you are hitting recursion limits, something has been lost along the way.  My first experiments with bisection (I dare say pretty standard within professional IT algorithm development but a struggle for me) predated the release of lambda helper functions and were specifically an attempt to get around the recursion limit on total argument count by reducing the depth of any recursion from N to LOG₂(N).

 

I resurrected the approach to deal with the REDUCE/VSTACK performance limitations that was itself a workaround for the fact that Excel treats arrays of arrays as an error rather than celebrating it as the correct solution to many, if not most, problems.  Whilst bisection required twice as many VSTACK operations, most were small and fast rather than repeatedly stacking large arrays.

@TheDub 

I may well be running round in circles but I have learnt a lot from implementing the binary tree.  The feature that a really want to hold on to is that the syntax of the final formula (as well as the Lambda function) should match the in-built BYROW precisely.  I do have some concerns about computational efficiency though.

 

I have written a BYCOL version that works will minimal modifications (basically changing ROWS(

resultϑ) to COLS(resultϑ) and VSTACK to HSTACK.  I have confirmed that the existing BinaryTreeλ will also work with thunk arrays generated by SCAN, but the modified helper function SCANλ is harder to write.  The modification to the user-provided Lambda function not only has to convert the output to thunks but it also has to reverse the calculation when the first 'acc' parameter is read.

 

 


I may well be running round in circles

I can only imagine what you brain must feel like in your skull...


I have written a BYCOL version that works will minimal modifications

Can you please share that too?

I would argue that if you successfully covered BYROW and BYCOL, you have covered 70%-80% of the array-of-arrays cases (at least in my limited experience) and made a major contribution to our mental health and welfare.

@TheDub 

Sadly, I seem to have messed up the workbook.  BYROWλ and BYCOLλ both seemed to be working well but then I started to work on SCANλ and suddenly the workbook started to fall apart.  #VALUE! errors show which disappear if one is persistent and re-enter the formula.  Only then another formula will suddenly error.

 

As far as SCANλ is concerned, it was to act as a wrapper for SCAN, but dethunk the accumulation parameter before using it and changing the user-provided Lambda function to apply THUNK to each result.  It all seemed so straightforward ...

 

@TheDub 

Is the workbook unstable on your computer?  I think I will separate the SCAN, with the thunks used to handle the 2 column array used to collect the Fibonacci series, to another workbook since that is where the trouble started.   

 

I have just submitted 'My Feedback' to Microsoft.  It is not a system I am familiar with, so we will see what happens!

Not sure what exactly you mean by "unstable", but I'm having enough occasional hiccups to require me to exit Excel and restart. Very frustrating..

@TheDub 

Maybe I am lucky then; I haven't needed restarts.  In the present case what I hit was a situation in which the SCAN (as applied to generating the Fibonacci series) would show a value error.  Recommit the formula and it would show the correct results.  But simultaneously one of the other formulas would show an error.  Recommit that and, with a bit of luck, both would show as correct.

 

I have also been experimenting with an alternative approach to expanding an array of thunks.  Since many, if not all, nested array problems can be expressed as an array of thunks, that could be regarded as a very general problem arising in many contexts.  I was, and still am, rather proud of the binary pointer approach that built the binary tree recursion but the new idea is to use WRAPROWS to place the thunks into pairs and then combine each pair into a single thunk holding two stacked arrays.  The process is repeated using REDUCE until there is only one thunk holding the entire solution.  Add the null parameter string and there you are, home and dry.

 

I am still to decide on the best approach for dealing with an odd number of thunks in the array to make sure the #N/A in the final pair does not propagate as an error through the calculation.

 

Then I will need to test for performance.  It is one thing to get some code working for arrays of length 10 and quite another to be sure everything still works when you reach 10,000 (never mind a million)!

@TheDub 

I have completed the alternative approach to the problem of expanding a column of thunks to give a 2D array.  As before, the task of generating the column of thunks uses a BYROW wrapper that first reduces the column of row arrays to a column of thunks and then uses a binary tree to expand and stack the results.

 

I do have concerns relating to the possible efficiency of the process but first one needs to obtain answers and only then does efficiency become relevant!

The same challenge but a somewhat different solution.  Instead of selecting thunks to combine using INDEX with arguments generated by the binary tree I used array shaping functions WRAPROWS and TAKE to pair up thunks for combining.  It still uses a binary tree but REDUCE is used to work through the levels until the entire solution is held as a single array.

 

This is the main lambda function BYROWλ (made a bit more verbose by the change control block and runtime documentation)

/*  FUNCTION NAME:  BYROWλ
    DESCRIPTION:    Implements a version of BYROW that will return an array of arrays */
/*  REVISIONS:      Date            Developer           Description
                    09 May 2024     Peter Bartholomew   Original Developemnt  
*/

BYROWλ = LAMBDA(
//  Parameter Declarations     
    array,fnλ,
//  Procedure
    LET(help,       TRIM(TEXTSPLIT(
                        "DESCRIPTION:   →Implements a version of BYROW that will return an array of arrays.¶" &
                        "VERSION:       →09 May 2004 ¶" &
                        "PARAMETERS:    →¶" &
                        "  array          →(Required) A two dimensional array of values. ¶" &
                        "  fnλ            →(Required) A lambda function that accepts a row array as input.",
                        "→", "¶" )
                    ),
    //  Check inputs
        array,          IF(OR(ISOMITTED(array), array=""),      #Value!, array),
        fnλ,            IF(OR(ISOMITTED(fnλ), TYPE(fnλ)<>128),  #Value!, fnλ),
    //  Procedure
        n,              ROWS(array),
        //              Implements BYROW returning thunked results at each step
        thunkArrayϑ,    BYROW(array, LAMBDA(row, THUNK(fnλ(row)))),
        //              Recombine pairs of thunks as a binary tree until the root node is reached
        k,              SEQUENCE(LEN(BASE(n - 1, 2))),
        recombinedϑ,    @REDUCE(thunkArrayϑ, k, JOINPAIRSλ),
        result,         TAKE(recombinedϑ(), n),
        //  Handle Error
        error,          MAX(IsError(result)+1),
    //  Return result
        CHOOSE(error, result, Help)
    )
);

It calls a further function to perform the pairwise stacking of the contents of thunks.

/*  FUNCTION NAME:  JOINPAIRSλ
    DESCRIPTION:    Called by BYROW to stack the contents of thunks pairwise */
/*  REVISIONS:      Date            Developer           Description
                    09 May 2024     Peter Bartholomew   Original Developemnt  
*/
JOINPAIRSλ
= LAMBDA(thunkArray, [k],
    LET(

        alternate, WRAPROWS(thunkArray, 2, thunk("")),
        firstpart, TAKE(alternate, , 1),
        finalpart, TAKE(alternate, , -1),

        MAP(
            firstpart,
            finalpart,
            LAMBDA(ϑ₁, ϑ₂, LAMBDA(VSTACK((@ϑ₁)(), (@ϑ₂)())))
        )
    )
);

 The tricky part was ensuring that the single thunks did not return type 64 (array) but were coerced to type 128 (lambda function).  

 

I have tested the functions up to 8192 rows and, although the performance wasn't sparkling, it only took a few seconds.