Forum Discussion

Silver Contributor

Oct 02, 2024

Double Thunking Works Wonders!

Given that most Excel users would not dream of employing one thunk, you might well ask why even consider nested thunks! The use case explored here is to return all the combinations by which one migh...

LevenschteinDistance - reduce.xlsx54 KB

excel

Formulas and Functions

Silver Contributor

Nov 01, 2024

Patrick2788

Your post drew my attention to the fact that, although I have given a lot of attention to unravelling arrays of thunks that arise 'naturally' as workarounds for the 'array of array', 'nested arrays' and 'array or ranges' limitations of Excel, I have paid little attention to the modelling of nested multi-dimensional arrays.

This is my attempt to rectify that situation. By all means ignore the post, at least until you have completed your testing. I would hate to come between a man and his rigorous testing regime or even spoil the fun of the chase! I think it is a worthwhile problem because Excel data quite often seems to be prepared as an array of forms, each containing array data. That, despite the wishes of the more database-oriented of our friends, who would prefer to see all data in normalised tables.

In fact, you do show the process of normalising the data as one of your outputs, but my past experience is that the steps that allow the data to be normalised as a starting point for standard solution processes, often allow the desired output to be derived directly from the cross-tabulated input.

I will show a picture of my worksheet because that will allow you to see what I have done without necessarily getting tied up with the 'how'.

Blocks.xlsx23 KB

Patrick2788

Silver Contributor

Nov 02, 2024

PeterBartholomew1

I think the function is at a good point where I'm ready to share. I can think of further improvements that can be made (e.g. a 'direction' parameter to offer both 'scan by column' and 'scan by row' and even the option to return the output as a vector), but the tinkering may never end! Adding parameter checks and more sophisticated error prevention can wait, too.

PeterBartholomew1's EVALTHUNKARRλ unpacked the thunks roughly 10x faster than conventional unpacking with REDUCE.

//Re-shape a matrix by specifying width and depth. 'shape' - 0 to vertically stack arrays,
//1 to horizontally stack arrays. 'shape' defaults to vertical stack if omitted.
//'pad_with' defaults to "" if omitted.

ReShapeMatrixλ = LAMBDA(matrix, depth, width, [shape], [pad_with],
        LET(

//Prevent uneven distribution of rows and/or columns by
//ensuring dimensions are properly rounded according to
//depth and width.
            i, CEILING.MATH(ROWS(matrix), depth),
            j, CEILING.MATH(COLUMNS(matrix), width),

//Set padding and expand matrix as needed.
            padding, IF(ISOMITTED(pad_with), "", pad_with),
            M, EXPAND(matrix, i, j, padding),

//Create two matrices which serve as row and column coordinates
//for INDEX.
            M_rows, SEQUENCE(i / depth, , , depth) * SEQUENCE(, j / width, , 0),
            M_cols, SEQUENCE(, j / width, , width) * SEQUENCE(i / depth, , , 0),

//Generate 'r' and 'c' coordinates based on starting row (v₁) and column (v₂).
//Retrieve 'block' from matrix as a vector and stuff in a thunk.
            CreateVectors, LAMBDA(v₁, v₂,
                LET(
                    r, TOCOL(SEQUENCE(depth, , v₁) * SEQUENCE(, width, , 0)),
                    c, TOCOL(SEQUENCE(, width, v₂) * SEQUENCE(depth, , , 0)),
                    block, INDEX(M, r, c),
                    THUNK(block)
                )
            ),

//Map and convert to vector to prepare for unpacking.

            thunks, TOCOL(MAP(M_rows, M_cols, CreateVectors)),

//Unpack thunks.

            unpacked, EVALTHUNKARRλ(thunks),

//Wrap vector according to desired shape.
            vshaped, WRAPROWS(unpacked, depth * width),
            hshaped, WRAPCOLS(unpacked, depth * width),
            reshaped, IF(
                ISOMITTED(shape),
                vshaped,
                IF(shape = 1, hshaped, vshaped)
            ),
            reshaped
        )
    );

ReShapeMatrix.xlsx887 KB

Patrick2788

Silver Contributor

Oct 03, 2025

I've re-visited this task determined to pull this off with a bit more elegance.

BlockMapλ is the result:

// Function:    BlockMapλ
// Author:      Patrick H.
// Version:     1.0    
// Published:   10/3/2025
// Repo:        https://github.com/Patrick2788/Excel-Lambda/blob/main/BlockMap%CE%BB.xlsx
// Description: BlockMapλ reshapes a matrix row or column-wise by specified
//              block size (depth x width) with optional parameters for
//              shaping by column and transposing anchors.



BlockMapλ = 
LAMBDA(
    matrix,                //Input matrix
    depth,                 //Height of each block
    width,                 //Width of each block
    [re_shape_by_column?], //Optional: If TRUE, reshapes column-wise; else row-wise
    [transpose_anchors?],  //Optional: If TRUE, transpose row and col anchors; else normal anchors

    //Validate inputs
    Validateλ(matrix,depth,width,re_shape_by_column?,
    
    //Proceed

LET(
    //----------Re-shape logic---------------------------------
    x,ROUNDUP(ROWS(matrix)/depth,0),       //Block rows
    y,ROUNDUP(COLUMNS(matrix)/width,0),    //Block columns

    //Anchors
    i,SEQUENCE(x,,1,depth)*SEQUENCE(,y,1,0),
    j,SEQUENCE(,y,1,width)*SEQUENCE(x,,1,0),

    row_anchor,IF(ISOMITTED(transpose_anchors?),i,TRANSPOSE(i)),
    col_anchor,IF(ISOMITTED(transpose_anchors?),j,TRANSPOSE(j)),

    //Indices
    row_indices,TOCOL(row_anchor)+INT(SEQUENCE(,depth*width,0,1/(width))),
    col_indices,TOCOL(col_anchor)+TOROW(IFS(SEQUENCE(depth),SEQUENCE(,width,0,1))),

    //Output
    ReShapedMatrix,INDEX(matrix,row_indices,col_indices),
    output_matrix,IF(re_shape_by_column?,TRANSPOSE(ReShapedMatrix),ReShapedMatrix),
    output_matrix
)));

//----------Error Handling---------------------------------

//Validate inputs for BlockMapλ.
Validateλ =
LAMBDA(
    matrix,
    depth,
    width,
    re_shape_by_column?,
    on_valid,
    
    //Halt if not an array.
    IF(NOT(TYPE(matrix)=64),"#MATRIX!",

    //Halt if re_shape_by_col? is TRUE and result would return #SPILL! error.
    IF((re_shape_by_column?)*(ROWS(matrix)*(COLUMNS(matrix))/(width*depth)^2>16384),"#SPILL-RISK!",

    //Halt if result would not be a clean re-shaping.
    IF((MOD(ROWS(matrix), depth) <> 0) + 
       (MOD(COLUMNS(matrix),width)<>0),"#BLOCK MISMATCH!",
   
    //Halt if block size exceeds matrix dimensions or is text.
    IF(OR(AND(width=1,depth=1),
          depth>ROWS(matrix),width>COLUMNS(matrix),
          ISTEXT(width),ISTEXT(depth)),"#DIMENSIONS!",

    on_valid
    )))))

BlockMapλ.xlsx25 KB

Patrick2788
Silver Contributor
Dec 03, 2024
After some more testing it appears 16,380 is indeed the new operand stack limit.
To be precise:

16,380 / (Total λ param + Total LET param + 1)
Patrick2788
Silver Contributor
Nov 23, 2024
As I was testing each function and going beyond 8 million units, I noticed the recursive solution was hanging in there with Mapλ. Remember this informative article from a few years ago concerning recursion limits?
LAMBDA: Recursive Lists and Understanding Recursion Limits of LAMBDA | Microsoft Community Hub
The limit at the time being: 1024/(parameter count + 1)
I understand the Excel team quietly increased this limit (the numerator portion in the formula above) to a little over 2000 a few years ago.
I created a very rudimentary recursive function to test the current limit. Gradually adding a parameter and seeing how high it would go:
TestStack = LAMBDA(n,m,o,IF(n=4094,n,TestStack(n+1,m,o)));
The results of the test:
PeterBartholomew1
Silver Contributor
Nov 23, 2024
Those are very neat formulas. My recursion thinking largely stopped when the Lambda helper functions were released. Recursion is now the last stop saloon before giving up when I really do not know whether the number of recursive loops (is that an oxymoron?) could be 1 or 1000.
The danger with recursive stacking can be the amount of memory temporarily committed to the calculation, all of which needs to be managed by the system. Naturally I followed your speculation and tried MAPλ. I needed to Curry the EXPAND function to allow it to accept the value to be expanded from MAP.
What I finished up with was
= MAPλ(Letters, Expandλ(25, 25)) // Expands a cell value to cover an (n₁ x n₂) array Expandλ = LAMBDA(n₁, n₂, LAMBDA(x, EXPAND(x, n₁, n₂, x)));
Magnify.xlsx245 KB

Patrick2788

Silver Contributor

Nov 22, 2024

That's a fascinating approach for legacy in using CSE for everything! That's interesting how it moved you away from the fill handle. I never thought of that. I can recall spending way too much time with FREQUENCY/MATCH counting unique items and relying heavily on INDEX/SMALL-IF. I don't miss those at all!

In re: pooling layers. I stumbled on an "opposite" operation, interpolation as used in image processing. In terms of matrices:

Seems to be a good opportunity to create a function to do this for text and/or numbers. Mapλ is likely much faster than my method with numbers. The challenge was to use recursion:

//Create square matrices from each value in a vertical array.
SquareMλ =  LAMBDA(arr,x,
            LET(
                k,ROWS(arr),
                next,TAKE(arr,1),
                resized,EXPAND(next,@x,@x,next),
                acc,VSTACK(x,resized),
                discard,DROP(arr,1),
                final,DROP(acc,1),
                IF(k=1,final,SquareMλ(discard,acc))));

//Nearest-Neighbor Interpolation
Magnifyλ = LAMBDA(matrix, magnification,
            LET(
                next, TAKE(matrix, , 1),
                resized, SquareMλ(next, @magnification),
                acc, HSTACK(magnification, resized),
                discard, DROP(matrix, , 1),
                k, COLUMNS(matrix),
                final, DROP(acc, , 1),
                IF(k = 1, final, Magnifyλ(discard, acc))
        )
    );

Magnify.xlsx237 KB

PeterBartholomew1
Silver Contributor
Nov 22, 2024
I love it!
We had a solution; now we have the problem!🤩
PeterBartholomew1
Silver Contributor
Nov 22, 2024
Patrick2788
On your point concerning CSE formula, and speaking as a freak user that had schooled himself to enter all formulas using CSE, it is my opinion that the prior experience was a help rather than a hindrance in that case. I had broken the 'and send down' mentality that approaches lists and arrays as collections of scalar formulas. CSE was turgid though, and dynamic arrays came as a huge relief!
The other trick I used, especially with array formulas that return a single output, was to place formulas within defined names to avoid the implicit intersection that made such a mess of grid-based formulas.
The new methods bring with them a requirement for fresh learning. It is sometimes harder to identify the familiar knowledge that can safely be consigned to the trash-can of history! Anyone want to know how to divide weights expressed as st-lb-oz by 17 or perhaps extract square roots by hand? 😱
Patrick2788
Silver Contributor
Nov 21, 2024
I've often wondered if there are visitors to this site that come across discussions like this and think "when would this ever be used?".
Convolutional neural networks - pooling layers
"In neural networks, a pooling layer is a kind of network layer that downsamples and aggregates information that is dispersed among many vectors into fewer vectors"
INJECTλ handles this task easily:
=INJECTλ(M,2,2,MAX,BYARR)
Max Pooling.xlsx109 KB
Patrick2788
Silver Contributor
Nov 04, 2024
joelb95
This is interesting because I've been considering doing something like this for a function I'm working on. The function would essentially be XLOOKUP with the capability of returning a matrix. I've created a function previously with some basic recursion to do this but the new function would take things further with the goal of optimizing calculation time as much as possible to handle very large sets of data. For this reason the function would default to a binary search to avoid the "slowness" of a linear search. There's also the matter of sorting the arrays to accommodate the binary lookup. The function would utilize some recursion and maybe some thunks. A lot of testing would need to be done.
I'll need to set aside some time to study your contributions, too. There's a nice variety in authorship styles on this forum and it's interesting to see how others think.

PeterBartholomew1
Thank you for sharing those links. I had not heard of Wilkinson before checking out those links. He's in rare company with those who have won the Turing award and/or von Neumann prize. Wilkinson's advantage made me think if there is a parallel with Excel. Do those who've studied CSE arrays and worked with arrays prior to even the advent of SUMIFS,COUNTIFS, AVERAGEIFS, etc. have an advantage when working with Excel 365? 3 years ago I would say yes but some of the older methods have become obsolete. Someone new to Excel can jump in with dynamic arrays and not waste a moment on three-finger formulas! I'm beginning to wonder if even COUNTIF/SUMIF, etc. are no longer relevant. GROUPBY can handle those tasks and I find deploying COUNTIF/SUMIF, etc. through Lambda helpers is not usually efficient.
joelb95
Brass Contributor
Nov 03, 2024
Reduce merely requires a single array (of any dimension) as a return - it doesn't inherently limit the type of return (so far as I know). If you are willing to thunk thunks, your return stack can be fairly deep before you notice major slow downs. You could, for instance, turn each row of an array into a thunk and hstack them iteratively via reduce, thereby giving you a row of rows. You can write an access function along the lines of

lambda(
outer_address,
row_of_thunks,
lambda(
inner_address,
index(
index(
row_of_thunks,
1,
outer_address
)(),
1,
inner_address
)
)

In this way, thunks just become 2d buckets that could be anything at all and arrays are just the scaffolding for a basic thunk arrangement. Depending how you choose to write your functions, you can have each thunk in an array represent a different state you want to persist and pass it easily from one name scope to another.

I've also been thinking a bit about the downside of just adding everything to a single 2 day array with more explicit contained array tracking as the first few columns, that way you can set a particular array as a variable name and have it extracted from the array as you require. Outside of the inherent limits of accessing an array and deconstructing it using take/drop/choosecols/etc., there is no overhead to maintaining the array and no stacking of functions and dependency chains for the data.
PeterBartholomew1
Silver Contributor
Nov 03, 2024
Patrick2788
Your mention of Python caused me to digress somewhat this afternoon! How does the Excel formula language compare with Python? Is the difference in the language, or in the effort committed by a wider community developing libraries. For example, when I read that Excel now has claim to be a Turing complete language, I set out to implement some complex programming tasks. I implemented a 4th order Kunge-Kutta integration routine (that proved remarkably straightforward and, even, elegant). The really hard task was the fast Fourier transform working over the complex roots of unity. That too, proved possible and, to my surprise, I wasn't the only person to undertake that particular task!

Then I realised, I was thinking of Excel formulas as the development language. The question might be, are the Python libraries developed using Python? Some might be, but typically they will be written in C and are compiled executables. However it turns out that some of the matrix manipulation algorithms are Fortran! As a former Fortran programmer, that grabbed my attention. Mention was made of libraries such as LINPACK that I remembered coming across early in my career.

The I came across a transcript from the Society for Industrial and Applied Mathematics of an interview with Jack J. Dongarra conducted by THOMAS HAIGH. That was all very interesting, but the statement that "the mathematical software that he developed was largely based on algorithms and ideas that had been were the work of Jim Wilkinson and his colleagues" grabbed my attention. Wilkinson was someone I had came across at the beginning of my Civil Service career; my head of section visited Wilkinson and used his routines in the early days of finite element research.

That in turn took me to J. H. Wilkinson - A.M. Turing Award Laureate; the award being given in 1970 (the year I started work, all green behind the ears). At the end of the article I read "By the time he retired, Jim had been given the rank of Special Merit Chief Scientific Officer within the British Civil Service, a category which is very rarely used and allows one to continue with their research without having to worry about other duties". That was something of an understatement, at the time there were about 11,000 scientists working for the various labs of the UK MOD and only two ever held that rank (I worked with Eric Mansfield - the other one).

Then came a further surprise. "At the end of the war, rather than returning to Cambridge, he joined the National Physical Laboratory (NPL) staff where he worked with Alan Turing on problems associated with Turing’s proposal to build an electronic computer". I had no idea that he had worked with Turing!

I read, "Wilkinson had one great advantage over von Neumann and Turing. He had been obliged to solve 18 equations in 18 unknowns, with a hand cranked mechanical calculator, during WWII. He had seen how amazingly accurate the direct method was. That arduous exercise helped Wilkinson think in a different way from these other experts in accounting for those roundoff errors". Not everyone can say that they overcame problems that von Neumann thought might prove unsurmountable!

All very interesting, but not what I was meant to be doing today!
Patrick2788
Silver Contributor
Nov 03, 2024
djclements
This is an excellent function and a very efficient use of dynamic programming. It's interesting that even though our approaches to this task are different there are still some similarities in delivering a solution. Your solution is quite optimal.
At present, when I approach a task in Excel, I go through the options available to solve the problem: dynamic programming, recursion, thunks, and a combination of the above. Dynamic programming still rules the day.
Excel's weakness still being the lack of support for nested arrays. This is underscored by the lack of flexibility of the Lambda helper functions. The two options to work around this being MAKEARRAY and REDUCE. The downside to the former is needing to know the dimensions going in and having to 'check' each element in the return matrix. The downside to using reduction of course being the need to repeatedly stack the accumulated results.

PeterBartholomew1
This is another 3-course meal! This will keep me busy for a while. I like how this builds on the existing functions in your module. The average user needn't see any of this programming but can do some pretty slick things. The complexity being hidden.

The process is reasonably efficient, but I would prefer Microsoft were to implement a STACK function that would expand any 2D array of thunks; no calculation I could perform would be a match for a native function in compiled code.
This is a fascinating idea and probably the next best thing to full support of nested arrays. Microsoft would need to officially recognize thunks for this to happen. Might as well also make THUNK a native function!

re: flattening a 'list' or 'nested arrays'

Python does this very simply through vstack (and quite a few other functions from I gather) :
Code written by a total novice!
import itertools as it # Set list of items to choose from and how many to take. items = np.array(xl("Items")).tolist() k = xl("pick") # Generate a list of all possible combinations. combinations = list(it.combinations_with_replacement(items,k)) # Unpack nested array and wrap rows according to the number of items in each combination to present in sheet. np.vstack(combinations).reshape(-1,k)
PeterBartholomew1
Silver Contributor
Nov 02, 2024
Patrick2788
We seem to have a good set of alternative strategies for reshaping multidimensional arrays!

I went a bit off track with my latest workbook and, instead of confining myself to the challenge of reshaping multidimensional arrays, I also set out to apply functions such as BYCOL(arr, SUM) to each of the inner 2D arrays. Rather than introducing INDEX, I used TAKE and DROP (along with MAKEARRAY) to return the submatrices. In theory, that would allow me to apply SUMIFS to each submatrix, though I haven't actually done that.

My function INJECTλ both introduces the function with a helper function and (optionally) reshapes the outer array using TOROW or TOCOL. What was a new experiment for me was to use a function
= INJECTλ(nestedArr, depth, width, SUM, BYCOL, [TOCOL])
with the signature: array, scalar, scalar, eta reduced function, helper function, eta reduced function.

Patrick2788 Encouraged by the success of that function, I confess to tampering with the final section of your function which wraps the vector according to the desired shape:
// "Wrap vector according to desired shape." WRAP, IF(shape, WRAPCOLS, WRAPROWS), reshaped, WRAP(unpacked, depth * width), reshaped
If 'shape' is omitted the default is zero so it will test as FALSE. The thing that interested me, is that WRAPROWS and WRAPCOLS (and hence WRAP) are all TYPE 128, (functions). Such a conditional formula was something of a departure from my normal practice ... and it works (I think)!
Blocks.xlsx25 KB
djclements
Silver Contributor
Nov 02, 2024
Patrick2788
Nice work Patrick. Interesting use of thunks for this task. It's mind-bending, to be sure!

The concept is very similar to a scan by row/column project I worked on recently, but with blocks as an additional dimension. Here's my take using SORTBY instead of thunks:

FLATTENBLOCKS: =LAMBDA(array,[block_height],[block_width],[to_cols],[order_mode],[pad_with], LET( pad_with, IF(ISOMITTED(pad_with),NA(),pad_with), sort_by_col, CHOOSE(order_mode+1,0,1,0,1), scan_by_col, CHOOSE(order_mode+1,0,0,1,1), h, IF(ISOMITTED(block_height),ROWS(array),block_height), w, IF(ISOMITTED(block_width),COLUMNS(array),block_width), r, CEILING.MATH(ROWS(array),h), c, CEILING.MATH(COLUMNS(array),w), i, ROUNDUP(SEQUENCE(r)/h,0), j, ROUNDUP(SEQUENCE(,c)/w,0), a, TOCOL(EXPAND(array,r,c,pad_with),,scan_by_col), y, TOCOL(IF(j,i),,scan_by_col), x, TOCOL(IF(i,j),,scan_by_col), v, IF(sort_by_col,SORTBY(a,x,1,y,1),SORTBY(a,y,1,x,1)), IF(to_cols,WRAPCOLS(v,h*w),WRAPROWS(v,h*w)) ) )

The [order_mode] argument has four options that affect both the order in which the blocks are processed, as well as the order in which the nested elements are processed:
0 - blocks by row; elements by row (default)
1 - blocks by column; elements by row
2 - blocks by row; elements by column
3 - blocks by column; elements by column

While I haven't come up with a scenario to test "double thunking" as of yet, I have been experimenting with thunks to some degree, using strategic arguments instead of the generic parameter-less variety. A simple demonstration for this is when the SCAN function needs to reference data from multiple columns. The standard solution, of course, is to use SEQUENCE-ROWS with INDEX. An alternative approach, however, is to use MAP to generate an array of lambda functions containing the relevant column data for each row, with a single argument for the accumulator. SCAN then calls each function by passing the accumulated value to its only argument.

Consider the following basic example:

The standard SEQUENCE-ROWS-INDEX method might look something like this:

=LET( tbl, A1:B19, amt, DROP(tbl,1,1), add, DROP(tbl,1,-1)=DROP(tbl,-1,-1), SCAN(0, SEQUENCE(ROWS(amt)), LAMBDA(a,v, a*INDEX(add,v)+INDEX(amt,v))) )

Whereas a thunk equivalent with MAP might be:

=LET( tbl, A1:B19, fxλ, MAP(DROP(tbl,1,1), DROP(tbl,1,-1)=DROP(tbl,-1,-1), LAMBDA(amt,add, LAMBDA(acc, acc*add+amt))), SCAN(0, fxλ, LAMBDA(a,λ, λ(a))) )

While relatively efficient, it's been my observation that methods using thunks tend to be a split second slower than methods without (in general). This seems to be due to the extra pass on the array that's needed: one pass to generate the thunks; and one pass to return the results. However, they do add a level of elegance and are fun to experiment with. And you never know when they might come in handy... in some complex scenarios, they may well be the only feasible option. Anyways, happy coding! 😉