Double Thunking Works Wonders!

Given that most Excel users would not dream of employing one thunk, you might well ask why even consider nested thunks! The use case explored here is to return all the combinations by which one migh...

LevenschteinDistance - reduce.xlsx54 KB

excel

Formulas and Functions

Silver Contributor

Nov 01, 2024

joelb95

Thank you for your contribution to the discussion. My somewhat tenuous understanding of what you are doing with your chunk function, is to stay in the world of real arrays but to build complex relationships between indices to relate the output arrays to the appropriate terms of the input arrays?

"The only reason I haven't stuck with the thunks is because you can't meaningfully persist them and their use case is basically limited to the exact formula you are currently constructing"

The first part is true and, in part, is one of the reasons I am moving to larger and larger blocks of functionality within a single Lambda function; possibly one formula cell per worksheet.

As for the second part, one of my goals is to conceal the complexity of thunks from the developer/user. I have specifically targeted the 'array of array' problem, which appears to sacrifice future functionality in order to maintain compatibility with past practice (though I doubt that any workbooks that would be affected even exist). I have written versions of the Lambda helper functions that have the same argument signature as the helper function they call but then convert both the user's arguments including their Lambda function to work with thunks. The process is reasonably efficient, but I would prefer Microsoft were to implement a STACK function that would expand any 2D array of thunks; no calculation I could perform would be a match for a native function in compiled code.

joelb95

Brass Contributor

Nov 03, 2024

PeterBartholomew1

My chunk function (sel.chunk) is just the combination of a slice by rows (sel.sliceRows) and a slice by columns (sel.sliceCols). The reason for all of the parameters is so that you can specify the starting and ending index for the row slice and the column slice using one of three methods:

a) absolute values(i.e. references to the start of the dimension), (e.g. start at column at end at column 6)

b) as values with reference to the end of the dimension, or (e.g. start at 8 columns from the end of the array and end at 3 columns from the end of the array),

c) as values with reference to the other value (e.g. start at column 5 and end 3 columns later or end at column 9 and start 4 columns earlier).

All of the index management is handled within the formula so you can almost think of it as a declarative type call - "Give me the slice with these dimensions". All of the validation/error checking happens within the final formula (sel.calcSliceStartAndEnd) which is dimension agnostic - you just specify the total dimension size and how the start and end indices should be determined.

The default values and ordering of the parameters are aimed at the standard use case - you tell it the starting index and you tell it either the ending index or how many additional units of the dimension you want (start at row 4 end at row 8 or start at row 4 and include 5 rows), but if you don't specify an end or a start, that just means "from the start give me 8 rows" or "from row 8 give me to the end." This way more complicated dimension slicing can be specifically referenced if needed and otherwise ignored entirely.

---------an FYI on my naming approach/syntax -------------

Syntax:

Functions in camel case,

Variables in snake case,

Classes (in other contexts) in pascal case, and

Global variables in all caps.

In a more nuanced setting, I'd also use leading underscores for "helper" type objects/functions. You don't see it in the "sel" module context, but I generally try to have functions that construct other functions start with "make", functions that retrieve something start with "get", and functions that are about novel value/object creation start with something like "calc".

Excel naming approach:

Module names should be intuitively related to the "user" type functions in the module (like "sel" stands for "select" and it includes the slicing and chunking functions),

Function names plus the module context should give you a strong intuitive sense of what the function expects/returns,

Variable names should essentially be self-documenting why they exist, and

Parameter names should self-document what they expect and end with "_df_" if there is a default value followed by what the default value is where the default value should strongly suggest what a valid argument type would be (e.g. "_df_TRUE" for when a parameter expects a boolean or "_df_2" for when it expects an int/number").

My main motivation with parameter names is that excel provides no type hints or hover information for users, so the only way they will know what valid entries are for the function is to divine it from the parameter name, to refer to other documentation, or to go read the function definition, and I'd rather the user have minimal friction (or learning curve) when invoking a function.

Forum Discussion

Double Thunking Works Wonders!

Resources