Forum Discussion
Double Thunking Works Wonders!
I've been working on a function that utilizes thunks. The rough draft of the function utilizes a single thunk to re-shape a matrix by specifying the depth and width.
The general idea:
Start with an input matrix that can be re-shaped into 1 of 2 possibilities. My working solution breaks the matrix into smaller matrices with the help of thunks. I hope to have something to share soon. At present the function is still undergoing rigorous testing and I need to determine what to do in situations where the number of columns and/or rows is not even.
(P.S. Updated chunk function to proper version)
Funnily enough I've been working on something somewhat similar, but mine is solidly within the bucket of what I'd call chunking. The primary solution I'm currently working towards involves precalculating the spacial arrangement of the chunks and then making iterators by way of things like map, bycol, or byrow. For the most part I do fancy accumulating stuff inside of a reduce combined with an iter function, but back in the day I was using thunks of thunks as list like objects and then converting those lists into 2d arrays by use of a layout engine (which included array dimension normalization logic and permitted the combination of arrays with different dimensions without unanticipated fill values). The only reason I haven't stuck with the thunks is because you can't meaningfully persist them and their use case is basically limited to the exact formula you are currently constructing.
What makes thunks helpful is that they are essentially functions and can be treated as values up until a context where you want them to be evaluated (accomplished by the "()" syntax). Any excel context which permits you to evaluate a function will give you access to the contents, so it all comes down to creatively playing the shell game of unwinding your thunks.
Also, consider stacking thunks in a 2 column array where the first column is some sort of identifier and you use a selection mechanism that can select the thunk you want by the identifier - it is much easier than having to iter through each thunk to find out what it is. And for what it is worth, you can very easily create a thunk, calculate some value from that thunk, and then save that value as a thunk. If you accumulate those values with your thunk, you can think of it is a caching mechanism and take advantage of key value pairs.
And just because, here is my latest iteration of an array chunker/slicer. Using this in an iter and then aggregating each chunk as a thunk is something even a punk could do.
//sel module
// for taking portions of other things.
chunk =
LAMBDA(
target_array,
[start_at_row],
[end_at_row],
[start_at_col],
[end_at_col],
[no_of_next_rows_to_include],
[no_of_next_cols_to_include],
[rows_to_stop_from_end_of_array],
[cols_to_stop_from_end_of_array],
[start_at_rows_before_end_of_slice],
[start_at_cols_before_end_of_slice],
[start_at_rows_before_end_of_array],
[start_at_cols_before_end_of_array],
[accept_partial_df_FALSE],
[return_first_row_if_no_start_df_FALSE],
[return_first_col_if_no_start_df_FALSE],
[return_last_row_if_no_end_df_FALSE],
[return_last_col_if_no_end_df_FALSE],
LET(
sliced_rows,
sliceRows(
target_array,
start_at_row,
end_at_row,
no_of_next_rows_to_include,
rows_to_stop_from_end_of_array,
start_at_rows_before_end_of_slice,
start_at_rows_before_end_of_array,
accept_partial_df_FALSE,
return_first_row_if_no_start_df_FALSE,
return_last_row_if_no_end_df_FALSE
),
error_in_row_slice, iserror(index(sliced_rows,1,1)),
sliced_rows_and_columns,
if(
error_in_row_slice,
error.type(3),
sliceCols(
sliced_rows,
start_at_col,
end_at_col,
no_of_next_cols_to_include,
cols_to_stop_from_end_of_array,
start_at_cols_before_end_of_slice,
start_at_cols_before_end_of_array,
accept_partial_df_FALSE,
return_first_col_if_no_start_df_FALSE,
return_last_col_if_no_end_df_FALSE
)
),
sliced_rows_and_columns
)
);
sliceRows =
LAMBDA(
target_array,
[start_at_row],
[end_at_row],
[no_of_next_rows_to_include],
[rows_to_stop_from_end_of_array],
[start_at_rows_before_end_of_slice],
[start_at_rows_before_end_of_array],
[accept_partial_df_FALSE],
[return_first_if_no_start_df_FALSE],
[return_last_if_no_end_df_FALSE],
LET(
row_bounds,
calcSliceStartAndEnd(
columns(target_array),
start_at_row,
end_at_row,
no_of_next_rows_to_include,
rows_to_stop_from_end_of_array,
start_at_rows_before_end_of_slice,
start_at_rows_before_end_of_array,
return_first_if_no_start_df_FALSE,
return_last_if_no_end_df_FALSE,
accept_partial_df_FALSE
),
start_row, index(row_bounds,1,1),
end_row, index(row_bounds,1,2),
return_trimmed_array,
columns(row_bounds)=3
,
trimmed_to_starting_row,
if(
return_trimmed_array,
DROP(
target_array,
start_row- 1
),
FALSE
),
trimmed_to_ending_row,
if(
return_trimmed_array,
take(
trimmed_to_starting_row,
end_row - start_row + 1
),
FALSE
),
result,
if(
return_trimmed_array,
trimmed_to_ending_row,
ERROR.TYPE(3)
),
result
)
);
sliceCols =
LAMBDA(
target_array,
[start_at_col],
[end_at_col],
[no_of_next_cols_to_include],
[cols_to_stop_from_end_of_array],
[start_at_cols_before_end_of_slice],
[start_at_cols_before_end_of_array],
[accept_partial_df_FALSE],
[return_first_col_if_no_start_df_FALSE],
[return_last_col_if_no_end_df_FALSE],
LET(
col_bounds,
calcSliceStartAndEnd(
columns(target_array),
start_at_col,
end_at_col,
no_of_next_cols_to_include,
cols_to_stop_from_end_of_array,
start_at_cols_before_end_of_slice,
start_at_cols_before_end_of_array,
return_first_col_if_no_start_df_FALSE,
return_last_col_if_no_end_df_FALSE,
accept_partial_df_FALSE
),
start_col, index(col_bounds,1,1),
end_col, index(col_bounds,1,2),
return_trimmed_array,
columns(col_bounds)=3
,
trimmed_to_starting_col,
if(
return_trimmed_array,
DROP(
target_array,
,
start_col- 1
),
FALSE
),
trimmed_to_ending_col,
if(
return_trimmed_array,
take(
trimmed_to_starting_col,
,
end_col - start_col + 1
),
FALSE
),
result,
if(
return_trimmed_array,
trimmed_to_ending_col,
ERROR.TYPE(3)
),
result
)
);
calcSliceStartAndEnd =
LAMBDA(
main_dimension_size,
[start_at_idx],
[end_at_idx],
[end_units_from_start_of_slice],
[end_units_from_end_of_dimension],
[start_units_from_end_of_slice],
[start_units_from_end_of_dimension],
[return_first_if_no_from_df_FALSE],
[return_last_if_no_through_df_FALSE],
[accept_partial_df_FALSE],
let(
// Parameter usage indicators
start_idx_provided, NOT(ISOMITTED(start_at_idx)),
calc_start_idx_from_end_of_slice,not(isomitted(start_units_from_end_of_slice)),
calc_start_idx_from_end_of_dimension,not(isomitted(start_units_from_end_of_dimension)),
start_from_first_unit_as_default,if(isomitted(return_first_if_no_from_df_FALSE),FALSE,return_first_if_no_from_df_FALSE),
end_idx_provided,not(isomitted(end_at_idx)),
calc_end_idx_from_start_of_slice,not(isomitted(end_units_from_start_of_slice)),
calc_end_idx_from_end_of_dimension,not(isomitted(end_units_from_end_of_dimension)),
end_at_final_unit_as_default,if(isomitted(return_last_if_no_through_df_FALSE),FALSE,return_last_if_no_through_df_FALSE),
at_least_one_relative_position, calc_start_idx_from_end_of_slice * calc_end_idx_from_start_of_slice,
accept_partial, if(isomitted(accept_partial_df_FALSE),FALSE,accept_partial_df_FALSE),
// Valid parameter checking
valid_start_parameter_provided,
(
start_idx_provided+
calc_start_idx_from_end_of_slice+
calc_start_idx_from_end_of_dimension
) = 1
,
valid_end_parameter_provided,
(
end_idx_provided +
calc_end_idx_from_start_of_slice +
calc_end_idx_from_end_of_dimension
) = 1
,
valid_relative_start,
calc_start_idx_from_end_of_slice *
(end_idx_provided+calc_end_idx_from_end_of_dimension)
,
valid_relative_end,
calc_end_idx_from_start_of_slice *
(start_idx_provided+calc_start_idx_from_end_of_dimension)
,
valid_relative_parameter,
if(
at_least_one_relative_position,
(valid_relative_start+valid_relative_end) = 1,
TRUE
),
valid_parameters,
(valid_start_parameter_provided+start_from_first_unit_as_default) *
(valid_end_parameter_provided+end_at_final_unit_as_default) *
valid_relative_parameter
,
result,
if(
not(valid_parameters),
error.type(3),
let(
abs_start,
ifs(
valid_relative_start,
FALSE,
start_idx_provided,
start_at_idx,
calc_start_idx_from_end_of_dimension,
main_dimension_size-start_units_from_end_of_dimension,
1,
error.type(3)
),
abs_end,
ifs(
valid_relative_end,
FALSE,
end_idx_provided,
end_at_idx,
end_units_from_end_of_dimension,
main_dimension_size-end_units_from_end_of_dimension
),
start_idx,
if(
valid_relative_start,
abs_end-start_units_from_end_of_slice-1,
abs_start
),
end_idx,
if(
valid_relative_end,
abs_start+end_units_from_start_of_slice-1,
abs_end
),
is_valid_slice,
and(
end_idx>=start_idx,
start_idx>0,
((end_idx<=main_dimension_size)+accept_partial)
),
if(
is_valid_slice,
hstack(start_idx, end_idx,is_valid_slice),
error.type(3)
)
)
),
result
)
);
- joelb95Nov 03, 2024Brass Contributor
My chunk function (sel.chunk) is just the combination of a slice by rows (sel.sliceRows) and a slice by columns (sel.sliceCols). The reason for all of the parameters is so that you can specify the starting and ending index for the row slice and the column slice using one of three methods:
a) absolute values(i.e. references to the start of the dimension), (e.g. start at column at end at column 6)
b) as values with reference to the end of the dimension, or (e.g. start at 8 columns from the end of the array and end at 3 columns from the end of the array),
c) as values with reference to the other value (e.g. start at column 5 and end 3 columns later or end at column 9 and start 4 columns earlier).
All of the index management is handled within the formula so you can almost think of it as a declarative type call - "Give me the slice with these dimensions". All of the validation/error checking happens within the final formula (sel.calcSliceStartAndEnd) which is dimension agnostic - you just specify the total dimension size and how the start and end indices should be determined.
The default values and ordering of the parameters are aimed at the standard use case - you tell it the starting index and you tell it either the ending index or how many additional units of the dimension you want (start at row 4 end at row 8 or start at row 4 and include 5 rows), but if you don't specify an end or a start, that just means "from the start give me 8 rows" or "from row 8 give me to the end." This way more complicated dimension slicing can be specifically referenced if needed and otherwise ignored entirely.
---------an FYI on my naming approach/syntax -------------
Syntax:
Functions in camel case,
Variables in snake case,
Classes (in other contexts) in pascal case, and
Global variables in all caps.
In a more nuanced setting, I'd also use leading underscores for "helper" type objects/functions. You don't see it in the "sel" module context, but I generally try to have functions that construct other functions start with "make", functions that retrieve something start with "get", and functions that are about novel value/object creation start with something like "calc".
Excel naming approach:
Module names should be intuitively related to the "user" type functions in the module (like "sel" stands for "select" and it includes the slicing and chunking functions),
Function names plus the module context should give you a strong intuitive sense of what the function expects/returns,
Variable names should essentially be self-documenting why they exist, and
Parameter names should self-document what they expect and end with "_df_" if there is a default value followed by what the default value is where the default value should strongly suggest what a valid argument type would be (e.g. "_df_TRUE" for when a parameter expects a boolean or "_df_2" for when it expects an int/number").
My main motivation with parameter names is that excel provides no type hints or hover information for users, so the only way they will know what valid entries are for the function is to divine it from the parameter name, to refer to other documentation, or to go read the function definition, and I'd rather the user have minimal friction (or learning curve) when invoking a function.
- PeterBartholomew1Nov 01, 2024Silver Contributor
Thank you for your contribution to the discussion. My somewhat tenuous understanding of what you are doing with your chunk function, is to stay in the world of real arrays but to build complex relationships between indices to relate the output arrays to the appropriate terms of the input arrays?
"The only reason I haven't stuck with the thunks is because you can't meaningfully persist them and their use case is basically limited to the exact formula you are currently constructing"
The first part is true and, in part, is one of the reasons I am moving to larger and larger blocks of functionality within a single Lambda function; possibly one formula cell per worksheet.
As for the second part, one of my goals is to conceal the complexity of thunks from the developer/user. I have specifically targeted the 'array of array' problem, which appears to sacrifice future functionality in order to maintain compatibility with past practice (though I doubt that any workbooks that would be affected even exist). I have written versions of the Lambda helper functions that have the same argument signature as the helper function they call but then convert both the user's arguments including their Lambda function to work with thunks. The process is reasonably efficient, but I would prefer Microsoft were to implement a STACK function that would expand any 2D array of thunks; no calculation I could perform would be a match for a native function in compiled code.