Forum Discussion
What do you think of thunks?
Hi David
You raise some really interesting points.
"Honestly, the number of scenarios I've come across where thunks were required, or where thunks were the most efficient solution, have been few and far between"
My thoughts are that, as with any computational framework, there tend to be a spectrum of methods from methods of choice though to methods of last resort. That is not new. For me OFFSET tends to be a method of last resort because of its volatility, yet I like the function's syntax and the order of calculation when it appears in a formula allows it to achieve results that would otherwise be problematic.
The same might be said for thunks. They are something of an untested curiosity, yet they provide a uniform approach to dealing with arrays of arrays (that is normal spreadsheet solutions). The question is "where do they sit on the spectrum of methods?" Some of the properties of thunks are not new but, rather, simply inherited from the Defined Names that have been around from 1992 or earlier.
For example, one could define a Name 'maxOccurrences' to be
= MAX(COUNTIFS(array, array))As it stands, it is simply the definition of a computationally intensive formula. It is not evaluated until it is used by a grid formula. The catch is that the formula is re-evaluated from scratch every time it is referenced. Something similar goes for the thunked version:
= maxOccurrencesλ
where
maxOccurrencesλ = LAMBDA(MAX(COUNTIFS(array#, array#)))The difference here is that merely being referenced by a grid formula is not sufficient to force its evaluation. That only happens when it is provided with its parameter string
= maxOccurrencesλ()Where I am going with this somewhat rambling discourse is that thunks and, in particular, arrays of thunks could be the method of choice for dealing with problems involving arrays of arrays with problem-specific approaches such as MAKEARRAY being the methods of last resort.
Before signing off, I ought to confess to a weakness in the argument. At present there is no inbuilt function that will persist that calculation beyond the in-memory LAMBDA scope in which it was created, by outputting the result to the grid. The following function does the job. but one shouldn't be dipping into a user-provided Lambda to achieve core functionality!
p.s. Some of your recommendations found their way into my GIST. Thank you!
/* FUNCTION NAME: EVALTHUNKARRλ
DESCRIPTION: Called by modified helper functions to stack the contents of any array of thunks */
/* REVISIONS: Date Developer Description
14 Oct 2024 Peter Bartholomew Extracted for MAPλ as a separate module
*/
EVALTHUNKARRλ = LAMBDA(thunkArrayϑ,
LET(
m, ROWS(thunkArrayϑ),
n, COLUMNS(thunkArrayϑ),
h, SEQUENCE(CEILING.MATH(LOG(n,2),1)),
recombinedRowsϑ, IF(
n > 1,
BYROW(thunkArrayϑ, LAMBDA(thunkRowϑ, @REDUCE(thunkRowϑ, h, JOINPAIRSλ(1)))),
thunkArrayϑ
),
k, SEQUENCE(CEILING.MATH(LOG(m,2),1)),
recombinedϑ, IF(
m > 1,
REDUCE(recombinedRowsϑ, k, JOINPAIRSλ(0)),
recombinedRowsϑ
),
result, IFNA((@recombinedϑ)(), ""),
result
)
);
/* FUNCTION NAME: JOINPAIRSλ
DESCRIPTION: Called by EVALTHUNKARRλ to stack the contents of thunks pairwise */
/* REVISIONS: Date Developer Description
09 May 2024 Peter Bartholomew Original Development
16 May 2024 Peter Bartholomew Test for unpaired thunk in binary tree
30 Aug 2024 Peter Bartholomew Modify to stack horizontally or vertically
15 Nov 2024 David Clements Efficiency improvements / simplification
*/
JOINPAIRSλ = LAMBDA([by_col], LAMBDA(thunkArray, [k],
LET(
STACKλ, IF(by_col, HSTACK, VSTACK),
alternate, WRAPROWS(thunkArray, 2),
MAP(
TAKE(alternate, , 1),
DROP(alternate, , 1),
LAMBDA(ϑ₁, ϑ₂,
LET(
x₁, ϑ₁(),
x₂, ϑ₂(),
v, IF(TYPE(ϑ₂)=16, x₁, STACKλ(x₁, x₂)),
LAMBDA(v)
)
)
)
)
));A version of Excel MAP helper function that will return an array of arrays
Hi Peter,
Just to clarify: my comment about thunks rarely being "the most efficient solution" shouldn't be misconstrued as being "inefficient". I've actually found thunks to be very efficient in many, if not all cases. Your custom LAMBDA functions are by far the best generalized approach I've seen for handling nested arrays, completely destroying the standard DROP-REDUCE-STACK method.
Having said that, whenever an array manipulation/reshaping method exists for a particular problem, thunks tend to come in second or third when timed over a larger dataset (but not by much). I believe this to be, in part, due to the fact that thunks require at least two passes over an array: once to generate the thunks, and again to call and return the results.
To give them a fair shake, though, it may be prudent to rate thunks on a scale from 1 to 10 in different categories (e.g. efficiency, flexibility, adaptability, difficulty, etc.). I might give them an 8 or 9 in efficiency, but definitely a 10 in flexibility and adaptability. Difficulty level would depend on the person, but for the average user, it's probably a 10.
I admit, the alternative methods I often share are anything but generalized, and it's unrealistic to think the average user would have the same interest in exploring and remembering each one, especially considering many of them would also likely be ranked high in difficulty. For me, though, the challenge is half the fun! Where the average user may find comfort in an easily adaptable, generalized solution, I think I would find monotony, lol.
Regarding the OFFSET function, I totally agree with you. I only included it as an option because it actually proved to be the fastest solution when tested with 100K rows of data. I had started with an INDEX equivalent first:
=LET(d,Ranges!D:D,r,ROW(Data),m,MIN(r),n,MAX(r),FILTER(Data,MAP(dataValues,INDEX(d,m-2):INDEX(d,n-2),INDEX(d,m+2):INDEX(d,n+2),LAMBDA(v,b,a,MAX(b:a)=v))))But found it to be a touch faster when converted to OFFSET. When I saw Daniel's variation, I immediately realized that I hadn't reduced my own version down to its simplest form, which would've been:
=FILTER(Data,MAP(dataValues,LAMBDA(v,MAX(OFFSET(v,-2,,5))=v)))Which also means the original version could've been simplified as:
=FILTER(Data,MAP(dataValues,LAMBDA(v,MAX(INDEX(Ranges!D:D,ROW(v)-2):INDEX(Ranges!D:D,ROW(v)+2))=v)))Not a big deal, though. If this were a real-world scenario, I'd likely sacrifice the 40 or so milliseconds and go with the second formula I shared (BYROW over an INDEX-SEQUENCE+SEQUENCE array) to avoid the OFFSET function altogether.
Cheers!