Silver Contributor

Jan 18, 2025

What do you think of thunks?

OK, so the most likely response by far is going to be "I don't". However, I tried one of Omid Motamedisedeh's regular challenges and found it a suitable problem for exploring some of the lesser k...

Rolling Maximum over 5 periods.xlsx51 KB

developer

excel

Formulas and Functions

Silver Contributor

Jan 23, 2025

Hi David

You raise some really interesting points.

"Honestly, the number of scenarios I've come across where thunks were required, or where thunks were the most efficient solution, have been few and far between"

My thoughts are that, as with any computational framework, there tend to be a spectrum of methods from methods of choice though to methods of last resort. That is not new. For me OFFSET tends to be a method of last resort because of its volatility, yet I like the function's syntax and the order of calculation when it appears in a formula allows it to achieve results that would otherwise be problematic.

The same might be said for thunks. They are something of an untested curiosity, yet they provide a uniform approach to dealing with arrays of arrays (that is normal spreadsheet solutions). The question is "where do they sit on the spectrum of methods?" Some of the properties of thunks are not new but, rather, simply inherited from the Defined Names that have been around from 1992 or earlier.

For example, one could define a Name 'maxOccurrences' to be

= MAX(COUNTIFS(array, array))

As it stands, it is simply the definition of a computationally intensive formula. It is not evaluated until it is used by a grid formula. The catch is that the formula is re-evaluated from scratch every time it is referenced. Something similar goes for the thunked version:

= maxOccurrencesλ

where

maxOccurrencesλ = LAMBDA(MAX(COUNTIFS(array#, array#)))

The difference here is that merely being referenced by a grid formula is not sufficient to force its evaluation. That only happens when it is provided with its parameter string

= maxOccurrencesλ()

Where I am going with this somewhat rambling discourse is that thunks and, in particular, arrays of thunks could be the method of choice for dealing with problems involving arrays of arrays with problem-specific approaches such as MAKEARRAY being the methods of last resort.

Before signing off, I ought to confess to a weakness in the argument. At present there is no inbuilt function that will persist that calculation beyond the in-memory LAMBDA scope in which it was created, by outputting the result to the grid. The following function does the job. but one shouldn't be dipping into a user-provided Lambda to achieve core functionality!

p.s. Some of your recommendations found their way into my GIST. Thank you!

/*  FUNCTION NAME:  EVALTHUNKARRλ
    DESCRIPTION:    Called by modified helper functions to stack the contents of any array of thunks  */
/*  REVISIONS:      Date            Developer           Description
                    14 Oct 2024     Peter Bartholomew   Extracted for MAPλ as a separate module  
*/
EVALTHUNKARRλ = LAMBDA(thunkArrayϑ,
    LET(
        m, ROWS(thunkArrayϑ),
        n, COLUMNS(thunkArrayϑ),
        h, SEQUENCE(CEILING.MATH(LOG(n,2),1)),
        recombinedRowsϑ, IF(
            n > 1,
            BYROW(thunkArrayϑ, LAMBDA(thunkRowϑ, @REDUCE(thunkRowϑ, h, JOINPAIRSλ(1)))),
            thunkArrayϑ
        ),
        k, SEQUENCE(CEILING.MATH(LOG(m,2),1)),
        recombinedϑ, IF(
            m > 1, 
            REDUCE(recombinedRowsϑ, k, JOINPAIRSλ(0)), 
            recombinedRowsϑ
        ),
        result, IFNA((@recombinedϑ)(), ""),
        result
    )
);

/*  FUNCTION NAME:  JOINPAIRSλ
    DESCRIPTION:    Called by EVALTHUNKARRλ to stack the contents of thunks pairwise  */
/*  REVISIONS:      Date            Developer           Description
                    09 May 2024     Peter Bartholomew   Original Development  
                    16 May 2024     Peter Bartholomew   Test for unpaired thunk in binary tree
                    30 Aug 2024     Peter Bartholomew   Modify to stack horizontally or vertically
                    15 Nov 2024     David Clements      Efficiency improvements / simplification
*/

JOINPAIRSλ = LAMBDA([by_col], LAMBDA(thunkArray, [k],
    LET(
        STACKλ,    IF(by_col, HSTACK, VSTACK),
        alternate, WRAPROWS(thunkArray, 2),
        MAP(
            TAKE(alternate, , 1),
            DROP(alternate, , 1),
            LAMBDA(ϑ₁, ϑ₂,
                LET(
                    x₁, ϑ₁(), 
                    x₂, ϑ₂(), 
                    v,  IF(TYPE(ϑ₂)=16, x₁, STACKλ(x₁, x₂)), 
                    LAMBDA(v)
                )
            )
        )
    )
));

A version of Excel MAP helper function that will return an array of arrays

djclements

Silver Contributor

Jan 23, 2025

Hi Peter,

Just to clarify: my comment about thunks rarely being "the most efficient solution" shouldn't be misconstrued as being "inefficient". I've actually found thunks to be very efficient in many, if not all cases. Your custom LAMBDA functions are by far the best generalized approach I've seen for handling nested arrays, completely destroying the standard DROP-REDUCE-STACK method.

Having said that, whenever an array manipulation/reshaping method exists for a particular problem, thunks tend to come in second or third when timed over a larger dataset (but not by much). I believe this to be, in part, due to the fact that thunks require at least two passes over an array: once to generate the thunks, and again to call and return the results.

To give them a fair shake, though, it may be prudent to rate thunks on a scale from 1 to 10 in different categories (e.g. efficiency, flexibility, adaptability, difficulty, etc.). I might give them an 8 or 9 in efficiency, but definitely a 10 in flexibility and adaptability. Difficulty level would depend on the person, but for the average user, it's probably a 10.

I admit, the alternative methods I often share are anything but generalized, and it's unrealistic to think the average user would have the same interest in exploring and remembering each one, especially considering many of them would also likely be ranked high in difficulty. For me, though, the challenge is half the fun! Where the average user may find comfort in an easily adaptable, generalized solution, I think I would find monotony, lol.

Regarding the OFFSET function, I totally agree with you. I only included it as an option because it actually proved to be the fastest solution when tested with 100K rows of data. I had started with an INDEX equivalent first:

=LET(d,Ranges!D:D,r,ROW(Data),m,MIN(r),n,MAX(r),FILTER(Data,MAP(dataValues,INDEX(d,m-2):INDEX(d,n-2),INDEX(d,m+2):INDEX(d,n+2),LAMBDA(v,b,a,MAX(b:a)=v))))

But found it to be a touch faster when converted to OFFSET. When I saw Daniel's variation, I immediately realized that I hadn't reduced my own version down to its simplest form, which would've been:

=FILTER(Data,MAP(dataValues,LAMBDA(v,MAX(OFFSET(v,-2,,5))=v)))

Which also means the original version could've been simplified as:

=FILTER(Data,MAP(dataValues,LAMBDA(v,MAX(INDEX(Ranges!D:D,ROW(v)-2):INDEX(Ranges!D:D,ROW(v)+2))=v)))

Not a big deal, though. If this were a real-world scenario, I'd likely sacrifice the 40 or so milliseconds and go with the second formula I shared (BYROW over an INDEX-SEQUENCE+SEQUENCE array) to avoid the OFFSET function altogether.

Cheers!

Forum Discussion

What do you think of thunks?