Forum Discussion
Double Thunking Works Wonders!
If I understand the thinking, if one is using thunks it might make sense to thunk again to reduce the output to a scalar for calculation efficiency (It's easier to toss around a scalar than a large matrix, for example). This makes sense to me on a basic level but goes against my natural Excel instincts which are wired to adhere to the limitations of the current calculation engine.
The more I play around with thunks, the more my head spins with ideas of how I can improve functions I've built. I have a magic square Lambda that contains 3 algorithms depending on the integer. 2 of the algorithms could possibly be improved by the idea of double thunking because of the heavy use of MAP. I may get around to it after I digest this meal!
I seem to have struggled with thunks for quite a time now. The point I have reached, it to think of them as a code fragment, held as text, that can be evaluated at some later point. Rather like a defined Name, the formula may be little more than a reference or it may be a major calculation. A thunk that references a LET variable minimises the burden of recalculation.
The key point in this context, is that the code invoked by a thunk may itself contain further functions, including thunks. Hence I can pass arrays of thunks to a function, like SCAN, that expects a simple array of values.
Since one is passing code representing deferred calculations around, it is difficult to know whether it is correct or not. That is where the tricks I described above are involved. By isolating a single thunk and evaluating it, one can check that one in on-track to return the desired result. It is all pretty mind stretching though. What I have been aiming at, is to embed the complexity into purpose-written helper functions that may be used subsequently without requiring understanding of 'how' they work. That is very different from the traditional spreadsheet paradigm where the 'how' is blatantly obvious, being no more than grade school arithmetic, but the 'why' is obscured by the volume of cascading cell formulas.
BTW, I have just read that the Double Diamond brand of beer is being reintroduced, so soon I may be able to determine whether Double Diamond (hopefully not the indifferent keg-ale with carbon dioxide pressurisation) does still 'work wonders'.
- Patrick2788Oct 31, 2024Silver Contributor
I've been working on a function that utilizes thunks. The rough draft of the function utilizes a single thunk to re-shape a matrix by specifying the depth and width.
The general idea:
Start with an input matrix that can be re-shaped into 1 of 2 possibilities. My working solution breaks the matrix into smaller matrices with the help of thunks. I hope to have something to share soon. At present the function is still undergoing rigorous testing and I need to determine what to do in situations where the number of columns and/or rows is not even.
- PeterBartholomew1Nov 01, 2024Silver Contributor
Your post drew my attention to the fact that, although I have given a lot of attention to unravelling arrays of thunks that arise 'naturally' as workarounds for the 'array of array', 'nested arrays' and 'array or ranges' limitations of Excel, I have paid little attention to the modelling of nested multi-dimensional arrays.
This is my attempt to rectify that situation. By all means ignore the post, at least until you have completed your testing. I would hate to come between a man and his rigorous testing regime or even spoil the fun of the chase! I think it is a worthwhile problem because Excel data quite often seems to be prepared as an array of forms, each containing array data. That, despite the wishes of the more database-oriented of our friends, who would prefer to see all data in normalised tables.
In fact, you do show the process of normalising the data as one of your outputs, but my past experience is that the steps that allow the data to be normalised as a starting point for standard solution processes, often allow the desired output to be derived directly from the cross-tabulated input.
I will show a picture of my worksheet because that will allow you to see what I have done without necessarily getting tied up with the 'how'.
- Patrick2788Nov 02, 2024Silver Contributor
I think the function is at a good point where I'm ready to share. I can think of further improvements that can be made (e.g. a 'direction' parameter to offer both 'scan by column' and 'scan by row' and even the option to return the output as a vector), but the tinkering may never end! Adding parameter checks and more sophisticated error prevention can wait, too.
PeterBartholomew1's EVALTHUNKARRλ unpacked the thunks roughly 10x faster than conventional unpacking with REDUCE.
//Re-shape a matrix by specifying width and depth. 'shape' - 0 to vertically stack arrays, //1 to horizontally stack arrays. 'shape' defaults to vertical stack if omitted. //'pad_with' defaults to "" if omitted. ReShapeMatrixλ = LAMBDA(matrix, depth, width, [shape], [pad_with], LET( //Prevent uneven distribution of rows and/or columns by //ensuring dimensions are properly rounded according to //depth and width. i, CEILING.MATH(ROWS(matrix), depth), j, CEILING.MATH(COLUMNS(matrix), width), //Set padding and expand matrix as needed. padding, IF(ISOMITTED(pad_with), "", pad_with), M, EXPAND(matrix, i, j, padding), //Create two matrices which serve as row and column coordinates //for INDEX. M_rows, SEQUENCE(i / depth, , , depth) * SEQUENCE(, j / width, , 0), M_cols, SEQUENCE(, j / width, , width) * SEQUENCE(i / depth, , , 0), //Generate 'r' and 'c' coordinates based on starting row (v₁) and column (v₂). //Retrieve 'block' from matrix as a vector and stuff in a thunk. CreateVectors, LAMBDA(v₁, v₂, LET( r, TOCOL(SEQUENCE(depth, , v₁) * SEQUENCE(, width, , 0)), c, TOCOL(SEQUENCE(, width, v₂) * SEQUENCE(depth, , , 0)), block, INDEX(M, r, c), THUNK(block) ) ), //Map and convert to vector to prepare for unpacking. thunks, TOCOL(MAP(M_rows, M_cols, CreateVectors)), //Unpack thunks. unpacked, EVALTHUNKARRλ(thunks), //Wrap vector according to desired shape. vshaped, WRAPROWS(unpacked, depth * width), hshaped, WRAPCOLS(unpacked, depth * width), reshaped, IF( ISOMITTED(shape), vshaped, IF(shape = 1, hshaped, vshaped) ), reshaped ) );
- joelb95Oct 31, 2024Brass Contributor
(P.S. Updated chunk function to proper version)
Funnily enough I've been working on something somewhat similar, but mine is solidly within the bucket of what I'd call chunking. The primary solution I'm currently working towards involves precalculating the spacial arrangement of the chunks and then making iterators by way of things like map, bycol, or byrow. For the most part I do fancy accumulating stuff inside of a reduce combined with an iter function, but back in the day I was using thunks of thunks as list like objects and then converting those lists into 2d arrays by use of a layout engine (which included array dimension normalization logic and permitted the combination of arrays with different dimensions without unanticipated fill values). The only reason I haven't stuck with the thunks is because you can't meaningfully persist them and their use case is basically limited to the exact formula you are currently constructing.
What makes thunks helpful is that they are essentially functions and can be treated as values up until a context where you want them to be evaluated (accomplished by the "()" syntax). Any excel context which permits you to evaluate a function will give you access to the contents, so it all comes down to creatively playing the shell game of unwinding your thunks.
Also, consider stacking thunks in a 2 column array where the first column is some sort of identifier and you use a selection mechanism that can select the thunk you want by the identifier - it is much easier than having to iter through each thunk to find out what it is. And for what it is worth, you can very easily create a thunk, calculate some value from that thunk, and then save that value as a thunk. If you accumulate those values with your thunk, you can think of it is a caching mechanism and take advantage of key value pairs.
And just because, here is my latest iteration of an array chunker/slicer. Using this in an iter and then aggregating each chunk as a thunk is something even a punk could do.
//sel module // for taking portions of other things. chunk = LAMBDA( target_array, [start_at_row], [end_at_row], [start_at_col], [end_at_col], [no_of_next_rows_to_include], [no_of_next_cols_to_include], [rows_to_stop_from_end_of_array], [cols_to_stop_from_end_of_array], [start_at_rows_before_end_of_slice], [start_at_cols_before_end_of_slice], [start_at_rows_before_end_of_array], [start_at_cols_before_end_of_array], [accept_partial_df_FALSE], [return_first_row_if_no_start_df_FALSE], [return_first_col_if_no_start_df_FALSE], [return_last_row_if_no_end_df_FALSE], [return_last_col_if_no_end_df_FALSE], LET( sliced_rows, sliceRows( target_array, start_at_row, end_at_row, no_of_next_rows_to_include, rows_to_stop_from_end_of_array, start_at_rows_before_end_of_slice, start_at_rows_before_end_of_array, accept_partial_df_FALSE, return_first_row_if_no_start_df_FALSE, return_last_row_if_no_end_df_FALSE ), error_in_row_slice, iserror(index(sliced_rows,1,1)), sliced_rows_and_columns, if( error_in_row_slice, error.type(3), sliceCols( sliced_rows, start_at_col, end_at_col, no_of_next_cols_to_include, cols_to_stop_from_end_of_array, start_at_cols_before_end_of_slice, start_at_cols_before_end_of_array, accept_partial_df_FALSE, return_first_col_if_no_start_df_FALSE, return_last_col_if_no_end_df_FALSE ) ), sliced_rows_and_columns ) ); sliceRows = LAMBDA( target_array, [start_at_row], [end_at_row], [no_of_next_rows_to_include], [rows_to_stop_from_end_of_array], [start_at_rows_before_end_of_slice], [start_at_rows_before_end_of_array], [accept_partial_df_FALSE], [return_first_if_no_start_df_FALSE], [return_last_if_no_end_df_FALSE], LET( row_bounds, calcSliceStartAndEnd( columns(target_array), start_at_row, end_at_row, no_of_next_rows_to_include, rows_to_stop_from_end_of_array, start_at_rows_before_end_of_slice, start_at_rows_before_end_of_array, return_first_if_no_start_df_FALSE, return_last_if_no_end_df_FALSE, accept_partial_df_FALSE ), start_row, index(row_bounds,1,1), end_row, index(row_bounds,1,2), return_trimmed_array, columns(row_bounds)=3 , trimmed_to_starting_row, if( return_trimmed_array, DROP( target_array, start_row- 1 ), FALSE ), trimmed_to_ending_row, if( return_trimmed_array, take( trimmed_to_starting_row, end_row - start_row + 1 ), FALSE ), result, if( return_trimmed_array, trimmed_to_ending_row, ERROR.TYPE(3) ), result ) ); sliceCols = LAMBDA( target_array, [start_at_col], [end_at_col], [no_of_next_cols_to_include], [cols_to_stop_from_end_of_array], [start_at_cols_before_end_of_slice], [start_at_cols_before_end_of_array], [accept_partial_df_FALSE], [return_first_col_if_no_start_df_FALSE], [return_last_col_if_no_end_df_FALSE], LET( col_bounds, calcSliceStartAndEnd( columns(target_array), start_at_col, end_at_col, no_of_next_cols_to_include, cols_to_stop_from_end_of_array, start_at_cols_before_end_of_slice, start_at_cols_before_end_of_array, return_first_col_if_no_start_df_FALSE, return_last_col_if_no_end_df_FALSE, accept_partial_df_FALSE ), start_col, index(col_bounds,1,1), end_col, index(col_bounds,1,2), return_trimmed_array, columns(col_bounds)=3 , trimmed_to_starting_col, if( return_trimmed_array, DROP( target_array, , start_col- 1 ), FALSE ), trimmed_to_ending_col, if( return_trimmed_array, take( trimmed_to_starting_col, , end_col - start_col + 1 ), FALSE ), result, if( return_trimmed_array, trimmed_to_ending_col, ERROR.TYPE(3) ), result ) ); calcSliceStartAndEnd = LAMBDA( main_dimension_size, [start_at_idx], [end_at_idx], [end_units_from_start_of_slice], [end_units_from_end_of_dimension], [start_units_from_end_of_slice], [start_units_from_end_of_dimension], [return_first_if_no_from_df_FALSE], [return_last_if_no_through_df_FALSE], [accept_partial_df_FALSE], let( // Parameter usage indicators start_idx_provided, NOT(ISOMITTED(start_at_idx)), calc_start_idx_from_end_of_slice,not(isomitted(start_units_from_end_of_slice)), calc_start_idx_from_end_of_dimension,not(isomitted(start_units_from_end_of_dimension)), start_from_first_unit_as_default,if(isomitted(return_first_if_no_from_df_FALSE),FALSE,return_first_if_no_from_df_FALSE), end_idx_provided,not(isomitted(end_at_idx)), calc_end_idx_from_start_of_slice,not(isomitted(end_units_from_start_of_slice)), calc_end_idx_from_end_of_dimension,not(isomitted(end_units_from_end_of_dimension)), end_at_final_unit_as_default,if(isomitted(return_last_if_no_through_df_FALSE),FALSE,return_last_if_no_through_df_FALSE), at_least_one_relative_position, calc_start_idx_from_end_of_slice * calc_end_idx_from_start_of_slice, accept_partial, if(isomitted(accept_partial_df_FALSE),FALSE,accept_partial_df_FALSE), // Valid parameter checking valid_start_parameter_provided, ( start_idx_provided+ calc_start_idx_from_end_of_slice+ calc_start_idx_from_end_of_dimension ) = 1 , valid_end_parameter_provided, ( end_idx_provided + calc_end_idx_from_start_of_slice + calc_end_idx_from_end_of_dimension ) = 1 , valid_relative_start, calc_start_idx_from_end_of_slice * (end_idx_provided+calc_end_idx_from_end_of_dimension) , valid_relative_end, calc_end_idx_from_start_of_slice * (start_idx_provided+calc_start_idx_from_end_of_dimension) , valid_relative_parameter, if( at_least_one_relative_position, (valid_relative_start+valid_relative_end) = 1, TRUE ), valid_parameters, (valid_start_parameter_provided+start_from_first_unit_as_default) * (valid_end_parameter_provided+end_at_final_unit_as_default) * valid_relative_parameter , result, if( not(valid_parameters), error.type(3), let( abs_start, ifs( valid_relative_start, FALSE, start_idx_provided, start_at_idx, calc_start_idx_from_end_of_dimension, main_dimension_size-start_units_from_end_of_dimension, 1, error.type(3) ), abs_end, ifs( valid_relative_end, FALSE, end_idx_provided, end_at_idx, end_units_from_end_of_dimension, main_dimension_size-end_units_from_end_of_dimension ), start_idx, if( valid_relative_start, abs_end-start_units_from_end_of_slice-1, abs_start ), end_idx, if( valid_relative_end, abs_start+end_units_from_start_of_slice-1, abs_end ), is_valid_slice, and( end_idx>=start_idx, start_idx>0, ((end_idx<=main_dimension_size)+accept_partial) ), if( is_valid_slice, hstack(start_idx, end_idx,is_valid_slice), error.type(3) ) ) ), result ) );- PeterBartholomew1Nov 01, 2024Silver Contributor
Thank you for your contribution to the discussion. My somewhat tenuous understanding of what you are doing with your chunk function, is to stay in the world of real arrays but to build complex relationships between indices to relate the output arrays to the appropriate terms of the input arrays?
"The only reason I haven't stuck with the thunks is because you can't meaningfully persist them and their use case is basically limited to the exact formula you are currently constructing"
The first part is true and, in part, is one of the reasons I am moving to larger and larger blocks of functionality within a single Lambda function; possibly one formula cell per worksheet.
As for the second part, one of my goals is to conceal the complexity of thunks from the developer/user. I have specifically targeted the 'array of array' problem, which appears to sacrifice future functionality in order to maintain compatibility with past practice (though I doubt that any workbooks that would be affected even exist). I have written versions of the Lambda helper functions that have the same argument signature as the helper function they call but then convert both the user's arguments including their Lambda function to work with thunks. The process is reasonably efficient, but I would prefer Microsoft were to implement a STACK function that would expand any 2D array of thunks; no calculation I could perform would be a match for a native function in compiled code.
- PeterBartholomew1Oct 31, 2024Silver Contributor
If you haven't already looked at it, the module EVALTHUNKARRλ from
A version of Excel MAP helper function that will return an array of arrays
was intended to expand any 2D array of thunks representing embedded 2D arrays.
The patterns you illustrate are typical of multidimensional arrays. There, TOCOL will establish a possible order for the coefficients for the multiply-nested indices. WRAPROWS, WRAPCOLS, TRANSPOSE and TOROW (by row or column) will all reshape the array but the cyclic order of the indices remains unaltered. To change that, you either need to use SORT on one of the indices or to reduce the dimensionality using thunks and reshape the overall array with the internal thunk dimensions 'frozen in'.
If this is irrelevant to the challenge you are addressing please ignore it. Either way, I look forward to seeing what you have done.