Forum Discussion
What do you think of thunks?
Good day Peter,
As always, I appreciate the demonstration and hope others do too. Personally, I find thunks to be quite useful for highly complex scenarios, as they can be an elegant way of handling nested arrays; however, it's not necessarily the first method I would reach for in any given situation. Honestly, the number of scenarios I've come across where thunks were required, or where thunks were the most efficient solution, have been few and far between, but that hasn't stopped me from experimenting with them in the least. ;)
For this particular challenge, there's a couple of methods that come to mind. For starters, MAP with OFFSET would work, although it seems like a bit of a cheat considering the table data range begins on row 3, leaving exactly 2 rows above to spare (if there weren't enough rows above the data range to complete the task, this wouldn't work):
=FILTER(Data,MAP(dataValues,OFFSET(dataValues,-2,0),OFFSET(dataValues,2,0),LAMBDA(v,b,a,MAX(b:a)=v)))
Another option could be BYROW with INDEX and a SEQUENCE matrix:
=FILTER(Data,BYROW(INDEX(VSTACK(0,0,dataValues,0,0),SEQUENCE(ROWS(Data),,0)+SEQUENCE(,5)),MAX)=dataValues)
On the topic of thunks, though, you may find this interesting... when it comes to returning the results for multiple functions in a single array (e.g. ROWS, ISREF, MAX, etc.), you can do so with a single MAP function:
=LET(
arrϑ, ROLLINGRANGEλ(dataValues, 5),
func, HSTACK(ISREF,ROWS,AVERAGE,MAX),
MAP(IFNA(func,arrϑ),IFNA(arrϑ,func),LAMBDA(fn,ϑ,fn(ϑ())))
)
Plus, the extra column on the end in your sample file:
=LET(
arrϑ, ROLLINGRANGEλ(dataValues, 5),
func, HSTACK(ISREF,ROWS,AVERAGE,MAX),
calc, MAP(IFNA(func,arrϑ),IFNA(arrϑ,func),LAMBDA(fn,ϑ,fn(ϑ()))),
HSTACK(calc,IF(TAKE(calc,,-1)=dataValues,dataValues,""))
)
Fun, fun, fun! Have a good one!
- PeterBartholomew1Jan 23, 2025Silver Contributor
Hi David
You raise some really interesting points.
"Honestly, the number of scenarios I've come across where thunks were required, or where thunks were the most efficient solution, have been few and far between"
My thoughts are that, as with any computational framework, there tend to be a spectrum of methods from methods of choice though to methods of last resort. That is not new. For me OFFSET tends to be a method of last resort because of its volatility, yet I like the function's syntax and the order of calculation when it appears in a formula allows it to achieve results that would otherwise be problematic.
The same might be said for thunks. They are something of an untested curiosity, yet they provide a uniform approach to dealing with arrays of arrays (that is normal spreadsheet solutions). The question is "where do they sit on the spectrum of methods?" Some of the properties of thunks are not new but, rather, simply inherited from the Defined Names that have been around from 1992 or earlier.
For example, one could define a Name 'maxOccurrences' to be
= MAX(COUNTIFS(array, array))
As it stands, it is simply the definition of a computationally intensive formula. It is not evaluated until it is used by a grid formula. The catch is that the formula is re-evaluated from scratch every time it is referenced. Something similar goes for the thunked version:
= maxOccurrencesλ where maxOccurrencesλ = LAMBDA(MAX(COUNTIFS(array#, array#)))
The difference here is that merely being referenced by a grid formula is not sufficient to force its evaluation. That only happens when it is provided with its parameter string
= maxOccurrencesλ()
Where I am going with this somewhat rambling discourse is that thunks and, in particular, arrays of thunks could be the method of choice for dealing with problems involving arrays of arrays with problem-specific approaches such as MAKEARRAY being the methods of last resort.
Before signing off, I ought to confess to a weakness in the argument. At present there is no inbuilt function that will persist that calculation beyond the in-memory LAMBDA scope in which it was created, by outputting the result to the grid. The following function does the job. but one shouldn't be dipping into a user-provided Lambda to achieve core functionality!
p.s. Some of your recommendations found their way into my GIST. Thank you!
/* FUNCTION NAME: EVALTHUNKARRλ DESCRIPTION: Called by modified helper functions to stack the contents of any array of thunks */ /* REVISIONS: Date Developer Description 14 Oct 2024 Peter Bartholomew Extracted for MAPλ as a separate module */ EVALTHUNKARRλ = LAMBDA(thunkArrayϑ, LET( m, ROWS(thunkArrayϑ), n, COLUMNS(thunkArrayϑ), h, SEQUENCE(CEILING.MATH(LOG(n,2),1)), recombinedRowsϑ, IF( n > 1, BYROW(thunkArrayϑ, LAMBDA(thunkRowϑ, @REDUCE(thunkRowϑ, h, JOINPAIRSλ(1)))), thunkArrayϑ ), k, SEQUENCE(CEILING.MATH(LOG(m,2),1)), recombinedϑ, IF( m > 1, REDUCE(recombinedRowsϑ, k, JOINPAIRSλ(0)), recombinedRowsϑ ), result, IFNA((@recombinedϑ)(), ""), result ) ); /* FUNCTION NAME: JOINPAIRSλ DESCRIPTION: Called by EVALTHUNKARRλ to stack the contents of thunks pairwise */ /* REVISIONS: Date Developer Description 09 May 2024 Peter Bartholomew Original Development 16 May 2024 Peter Bartholomew Test for unpaired thunk in binary tree 30 Aug 2024 Peter Bartholomew Modify to stack horizontally or vertically 15 Nov 2024 David Clements Efficiency improvements / simplification */ JOINPAIRSλ = LAMBDA([by_col], LAMBDA(thunkArray, [k], LET( STACKλ, IF(by_col, HSTACK, VSTACK), alternate, WRAPROWS(thunkArray, 2), MAP( TAKE(alternate, , 1), DROP(alternate, , 1), LAMBDA(ϑ₁, ϑ₂, LET( x₁, ϑ₁(), x₂, ϑ₂(), v, IF(TYPE(ϑ₂)=16, x₁, STACKλ(x₁, x₂)), LAMBDA(v) ) ) ) ) ));
A version of Excel MAP helper function that will return an array of arrays
- djclementsJan 23, 2025Bronze Contributor
Hi Peter,
Just to clarify: my comment about thunks rarely being "the most efficient solution" shouldn't be misconstrued as being "inefficient". I've actually found thunks to be very efficient in many, if not all cases. Your custom LAMBDA functions are by far the best generalized approach I've seen for handling nested arrays, completely destroying the standard DROP-REDUCE-STACK method.
Having said that, whenever an array manipulation/reshaping method exists for a particular problem, thunks tend to come in second or third when timed over a larger dataset (but not by much). I believe this to be, in part, due to the fact that thunks require at least two passes over an array: once to generate the thunks, and again to call and return the results.
To give them a fair shake, though, it may be prudent to rate thunks on a scale from 1 to 10 in different categories (e.g. efficiency, flexibility, adaptability, difficulty, etc.). I might give them an 8 or 9 in efficiency, but definitely a 10 in flexibility and adaptability. Difficulty level would depend on the person, but for the average user, it's probably a 10.
I admit, the alternative methods I often share are anything but generalized, and it's unrealistic to think the average user would have the same interest in exploring and remembering each one, especially considering many of them would also likely be ranked high in difficulty. For me, though, the challenge is half the fun! Where the average user may find comfort in an easily adaptable, generalized solution, I think I would find monotony, lol.
Regarding the OFFSET function, I totally agree with you. I only included it as an option because it actually proved to be the fastest solution when tested with 100K rows of data. I had started with an INDEX equivalent first:
=LET(d,Ranges!D:D,r,ROW(Data),m,MIN(r),n,MAX(r),FILTER(Data,MAP(dataValues,INDEX(d,m-2):INDEX(d,n-2),INDEX(d,m+2):INDEX(d,n+2),LAMBDA(v,b,a,MAX(b:a)=v))))
But found it to be a touch faster when converted to OFFSET. When I saw Daniel's variation, I immediately realized that I hadn't reduced my own version down to its simplest form, which would've been:
=FILTER(Data,MAP(dataValues,LAMBDA(v,MAX(OFFSET(v,-2,,5))=v)))
Which also means the original version could've been simplified as:
=FILTER(Data,MAP(dataValues,LAMBDA(v,MAX(INDEX(Ranges!D:D,ROW(v)-2):INDEX(Ranges!D:D,ROW(v)+2))=v)))
Not a big deal, though. If this were a real-world scenario, I'd likely sacrifice the 40 or so milliseconds and go with the second formula I shared (BYROW over an INDEX-SEQUENCE+SEQUENCE array) to avoid the OFFSET function altogether.
Cheers!
- PeterBartholomew1Jan 20, 2025Silver Contributor
Hi David
I was aware that for the specific challenge I could avoid thunks altogether but my focus was more to answer the question "can thunks provide a unified approach to dealing with a number of array problems as one moves from scalar, to array, to array of arrays, ... ?" Ultimately, it could be a case of "why restrict yourself to one financial model?" when you could map them to a changing set of assumptions and create an array of models, rather like data tables on steroids!
Along the way, it would be nice if a single technique were to provide an efficient way of addressing headaches such as handling arrays of ranges, arrays of arrays, ragged arrays, nested arrays, cartesian products of 2D arrays. The solution may be something other than thunks, but at the moment they are looking promising and it is certainly better than the dreaded #CALC! error that tells you that the solution to your problem is not handled by Excel.
The think that really caught my attention was your idea of applying an array of functions to each data set as a single operation. Although I would have said the idea "could well be possible", that is not the same thing as adjusting one's thinking sufficiently to exploit the concept with any degree of confidence!
Well done.