Double Thunking Works Wonders!

Silver Contributor

Oct 31, 2024

I've been working on a function that utilizes thunks. The rough draft of the function utilizes a single thunk to re-shape a matrix by specifying the depth and width.

The general idea:

Start with an input matrix that can be re-shaped into 1 of 2 possibilities. My working solution breaks the matrix into smaller matrices with the help of thunks. I hope to have something to share soon. At present the function is still undergoing rigorous testing and I need to determine what to do in situations where the number of columns and/or rows is not even.

PeterBartholomew1
Silver Contributor
Nov 01, 2024
Patrick2788
Your post drew my attention to the fact that, although I have given a lot of attention to unravelling arrays of thunks that arise 'naturally' as workarounds for the 'array of array', 'nested arrays' and 'array or ranges' limitations of Excel, I have paid little attention to the modelling of nested multi-dimensional arrays.

This is my attempt to rectify that situation. By all means ignore the post, at least until you have completed your testing. I would hate to come between a man and his rigorous testing regime or even spoil the fun of the chase! I think it is a worthwhile problem because Excel data quite often seems to be prepared as an array of forms, each containing array data. That, despite the wishes of the more database-oriented of our friends, who would prefer to see all data in normalised tables.

In fact, you do show the process of normalising the data as one of your outputs, but my past experience is that the steps that allow the data to be normalised as a starting point for standard solution processes, often allow the desired output to be derived directly from the cross-tabulated input.

I will show a picture of my worksheet because that will allow you to see what I have done without necessarily getting tied up with the 'how'.

Blocks.xlsx23 KB
- Patrick2788
  Silver Contributor
  Nov 02, 2024
  PeterBartholomew1
  I think the function is at a good point where I'm ready to share. I can think of further improvements that can be made (e.g. a 'direction' parameter to offer both 'scan by column' and 'scan by row' and even the option to return the output as a vector), but the tinkering may never end! Adding parameter checks and more sophisticated error prevention can wait, too.
  
  PeterBartholomew1's EVALTHUNKARRλ unpacked the thunks roughly 10x faster than conventional unpacking with REDUCE.
  
  //Re-shape a matrix by specifying width and depth. 'shape' - 0 to vertically stack arrays, //1 to horizontally stack arrays. 'shape' defaults to vertical stack if omitted. //'pad_with' defaults to "" if omitted. ReShapeMatrixλ = LAMBDA(matrix, depth, width, [shape], [pad_with], LET( //Prevent uneven distribution of rows and/or columns by //ensuring dimensions are properly rounded according to //depth and width. i, CEILING.MATH(ROWS(matrix), depth), j, CEILING.MATH(COLUMNS(matrix), width), //Set padding and expand matrix as needed. padding, IF(ISOMITTED(pad_with), "", pad_with), M, EXPAND(matrix, i, j, padding), //Create two matrices which serve as row and column coordinates //for INDEX. M_rows, SEQUENCE(i / depth, , , depth) * SEQUENCE(, j / width, , 0), M_cols, SEQUENCE(, j / width, , width) * SEQUENCE(i / depth, , , 0), //Generate 'r' and 'c' coordinates based on starting row (v₁) and column (v₂). //Retrieve 'block' from matrix as a vector and stuff in a thunk. CreateVectors, LAMBDA(v₁, v₂, LET( r, TOCOL(SEQUENCE(depth, , v₁) * SEQUENCE(, width, , 0)), c, TOCOL(SEQUENCE(, width, v₂) * SEQUENCE(depth, , , 0)), block, INDEX(M, r, c), THUNK(block) ) ), //Map and convert to vector to prepare for unpacking. thunks, TOCOL(MAP(M_rows, M_cols, CreateVectors)), //Unpack thunks. unpacked, EVALTHUNKARRλ(thunks), //Wrap vector according to desired shape. vshaped, WRAPROWS(unpacked, depth * width), hshaped, WRAPCOLS(unpacked, depth * width), reshaped, IF( ISOMITTED(shape), vshaped, IF(shape = 1, hshaped, vshaped) ), reshaped ) );
  
  ReShapeMatrix.xlsx887 KB

joelb95

Brass Contributor

Oct 31, 2024

Patrick2788

(P.S. Updated chunk function to proper version)

Funnily enough I've been working on something somewhat similar, but mine is solidly within the bucket of what I'd call chunking. The primary solution I'm currently working towards involves precalculating the spacial arrangement of the chunks and then making iterators by way of things like map, bycol, or byrow. For the most part I do fancy accumulating stuff inside of a reduce combined with an iter function, but back in the day I was using thunks of thunks as list like objects and then converting those lists into 2d arrays by use of a layout engine (which included array dimension normalization logic and permitted the combination of arrays with different dimensions without unanticipated fill values). The only reason I haven't stuck with the thunks is because you can't meaningfully persist them and their use case is basically limited to the exact formula you are currently constructing.

What makes thunks helpful is that they are essentially functions and can be treated as values up until a context where you want them to be evaluated (accomplished by the "()" syntax). Any excel context which permits you to evaluate a function will give you access to the contents, so it all comes down to creatively playing the shell game of unwinding your thunks.

Also, consider stacking thunks in a 2 column array where the first column is some sort of identifier and you use a selection mechanism that can select the thunk you want by the identifier - it is much easier than having to iter through each thunk to find out what it is. And for what it is worth, you can very easily create a thunk, calculate some value from that thunk, and then save that value as a thunk. If you accumulate those values with your thunk, you can think of it is a caching mechanism and take advantage of key value pairs.

And just because, here is my latest iteration of an array chunker/slicer. Using this in an iter and then aggregating each chunk as a thunk is something even a punk could do.

//sel module
// for taking portions of other things.

chunk =
LAMBDA(
    target_array,
    [start_at_row],
    [end_at_row],
    [start_at_col],
    [end_at_col],
    [no_of_next_rows_to_include],
    [no_of_next_cols_to_include],
    [rows_to_stop_from_end_of_array],
    [cols_to_stop_from_end_of_array],
    [start_at_rows_before_end_of_slice],
    [start_at_cols_before_end_of_slice],
    [start_at_rows_before_end_of_array],
    [start_at_cols_before_end_of_array],
    [accept_partial_df_FALSE],
    [return_first_row_if_no_start_df_FALSE],
    [return_first_col_if_no_start_df_FALSE],
    [return_last_row_if_no_end_df_FALSE],
    [return_last_col_if_no_end_df_FALSE],
    LET(
        sliced_rows,
            sliceRows(
                target_array,
                start_at_row,
                end_at_row,
                no_of_next_rows_to_include,
                rows_to_stop_from_end_of_array,
                start_at_rows_before_end_of_slice,
                start_at_rows_before_end_of_array,
                accept_partial_df_FALSE,
                return_first_row_if_no_start_df_FALSE,
                return_last_row_if_no_end_df_FALSE
        ),
        error_in_row_slice, iserror(index(sliced_rows,1,1)),
        sliced_rows_and_columns,
            if(
                error_in_row_slice,
                error.type(3),
                sliceCols(
                    sliced_rows,
                    start_at_col,
                    end_at_col,
                    no_of_next_cols_to_include,
                    cols_to_stop_from_end_of_array,
                    start_at_cols_before_end_of_slice,
                    start_at_cols_before_end_of_array,
                    accept_partial_df_FALSE,
                    return_first_col_if_no_start_df_FALSE,
                    return_last_col_if_no_end_df_FALSE
                )
        ),
        sliced_rows_and_columns
    )
);


sliceRows =
LAMBDA(
    target_array,
    [start_at_row],
    [end_at_row],
    [no_of_next_rows_to_include],
    [rows_to_stop_from_end_of_array],
    [start_at_rows_before_end_of_slice],
    [start_at_rows_before_end_of_array],
    [accept_partial_df_FALSE],
    [return_first_if_no_start_df_FALSE],
    [return_last_if_no_end_df_FALSE],
    LET(
        row_bounds,
            calcSliceStartAndEnd(
                columns(target_array),
                start_at_row,
                end_at_row,
                no_of_next_rows_to_include,
                rows_to_stop_from_end_of_array,
                start_at_rows_before_end_of_slice,
                start_at_rows_before_end_of_array,
                return_first_if_no_start_df_FALSE,
                return_last_if_no_end_df_FALSE,
                accept_partial_df_FALSE
        ),
        start_row, index(row_bounds,1,1),
        end_row, index(row_bounds,1,2),
        return_trimmed_array,
            columns(row_bounds)=3
        ,
        trimmed_to_starting_row,
            if(
                return_trimmed_array,
                DROP(
                    target_array,
                    start_row- 1
                ),
                FALSE
        ), 
        trimmed_to_ending_row,
            if(
                return_trimmed_array,
                take(
                    trimmed_to_starting_row,
                    end_row - start_row + 1
                ),
                FALSE
        ),
        result,
            if(
                return_trimmed_array,
                trimmed_to_ending_row,
                ERROR.TYPE(3)
        ),
        result
    )
);

sliceCols =
LAMBDA(
    target_array,
    [start_at_col],
    [end_at_col],
    [no_of_next_cols_to_include],
    [cols_to_stop_from_end_of_array],
    [start_at_cols_before_end_of_slice],
    [start_at_cols_before_end_of_array],
    [accept_partial_df_FALSE],
    [return_first_col_if_no_start_df_FALSE],
    [return_last_col_if_no_end_df_FALSE],
    LET(
        col_bounds,
            calcSliceStartAndEnd(
                columns(target_array),
                start_at_col,
                end_at_col,
                no_of_next_cols_to_include,
                cols_to_stop_from_end_of_array,
                start_at_cols_before_end_of_slice,
                start_at_cols_before_end_of_array,
                return_first_col_if_no_start_df_FALSE,
                return_last_col_if_no_end_df_FALSE,
                accept_partial_df_FALSE
        ),
        start_col, index(col_bounds,1,1),
        end_col, index(col_bounds,1,2),
        return_trimmed_array,
            columns(col_bounds)=3
        ,
        trimmed_to_starting_col,
            if(
                return_trimmed_array,
                DROP(
                    target_array,
                    , 
                    start_col- 1
                ),
                FALSE
        ), 
        trimmed_to_ending_col,
            if(
                return_trimmed_array,
                take(
                    trimmed_to_starting_col,
                    ,
                    end_col - start_col + 1
                ),
                FALSE
        ),
        result,
            if(
                return_trimmed_array,
                trimmed_to_ending_col,
                ERROR.TYPE(3)
        ),
        result
    )
);

calcSliceStartAndEnd =
LAMBDA(
    main_dimension_size,
    [start_at_idx],
    [end_at_idx],
    [end_units_from_start_of_slice],
    [end_units_from_end_of_dimension],
    [start_units_from_end_of_slice],
    [start_units_from_end_of_dimension],
    [return_first_if_no_from_df_FALSE],
    [return_last_if_no_through_df_FALSE],
    [accept_partial_df_FALSE],
    let(
        // Parameter usage indicators
        start_idx_provided, NOT(ISOMITTED(start_at_idx)),
        calc_start_idx_from_end_of_slice,not(isomitted(start_units_from_end_of_slice)),
        calc_start_idx_from_end_of_dimension,not(isomitted(start_units_from_end_of_dimension)),
        start_from_first_unit_as_default,if(isomitted(return_first_if_no_from_df_FALSE),FALSE,return_first_if_no_from_df_FALSE),
        
        end_idx_provided,not(isomitted(end_at_idx)),
        calc_end_idx_from_start_of_slice,not(isomitted(end_units_from_start_of_slice)),
        calc_end_idx_from_end_of_dimension,not(isomitted(end_units_from_end_of_dimension)),
        end_at_final_unit_as_default,if(isomitted(return_last_if_no_through_df_FALSE),FALSE,return_last_if_no_through_df_FALSE),

        at_least_one_relative_position, calc_start_idx_from_end_of_slice * calc_end_idx_from_start_of_slice,
        accept_partial, if(isomitted(accept_partial_df_FALSE),FALSE,accept_partial_df_FALSE),

        
        // Valid parameter checking
        valid_start_parameter_provided,
            (
                start_idx_provided+
                calc_start_idx_from_end_of_slice+
                calc_start_idx_from_end_of_dimension
            ) = 1
        ,
        valid_end_parameter_provided,
            (
                end_idx_provided +
                calc_end_idx_from_start_of_slice + 
                calc_end_idx_from_end_of_dimension
            ) = 1
        ,
        valid_relative_start, 
            calc_start_idx_from_end_of_slice *
            (end_idx_provided+calc_end_idx_from_end_of_dimension)
        ,
        valid_relative_end,
            calc_end_idx_from_start_of_slice * 
            (start_idx_provided+calc_start_idx_from_end_of_dimension)
        ,
        valid_relative_parameter,
            if(
                at_least_one_relative_position,
                (valid_relative_start+valid_relative_end) = 1,
                TRUE
        ),
        valid_parameters,
            (valid_start_parameter_provided+start_from_first_unit_as_default) *
            (valid_end_parameter_provided+end_at_final_unit_as_default) *
            valid_relative_parameter
        ,
        result,
            if(
                not(valid_parameters),
                error.type(3),
                let(
                    abs_start,
                        ifs(
                            valid_relative_start,
                            FALSE,
                            start_idx_provided, 
                            start_at_idx,
                            calc_start_idx_from_end_of_dimension,
                            main_dimension_size-start_units_from_end_of_dimension,
                            1,
                            error.type(3)
                    ),
                    abs_end,
                        ifs(
                            valid_relative_end,
                            FALSE,
                            end_idx_provided,
                            end_at_idx,
                            end_units_from_end_of_dimension,
                            main_dimension_size-end_units_from_end_of_dimension
                    ),
                    start_idx,
                        if(
                            valid_relative_start,
                            abs_end-start_units_from_end_of_slice-1,
                            abs_start
                    ),
                    end_idx,
                        if(
                            valid_relative_end,
                            abs_start+end_units_from_start_of_slice-1,
                            abs_end
                    ),
                    is_valid_slice,
                        and(
                            end_idx>=start_idx,
                            start_idx>0,
                            ((end_idx<=main_dimension_size)+accept_partial)
                    ),
                    if(
                        is_valid_slice,
                        hstack(start_idx, end_idx,is_valid_slice),
                        error.type(3)
                )
            )
        ),
        result
    )
);

PeterBartholomew1
Silver Contributor
Nov 01, 2024
joelb95
Thank you for your contribution to the discussion. My somewhat tenuous understanding of what you are doing with your chunk function, is to stay in the world of real arrays but to build complex relationships between indices to relate the output arrays to the appropriate terms of the input arrays?

"The only reason I haven't stuck with the thunks is because you can't meaningfully persist them and their use case is basically limited to the exact formula you are currently constructing"
The first part is true and, in part, is one of the reasons I am moving to larger and larger blocks of functionality within a single Lambda function; possibly one formula cell per worksheet.

As for the second part, one of my goals is to conceal the complexity of thunks from the developer/user. I have specifically targeted the 'array of array' problem, which appears to sacrifice future functionality in order to maintain compatibility with past practice (though I doubt that any workbooks that would be affected even exist). I have written versions of the Lambda helper functions that have the same argument signature as the helper function they call but then convert both the user's arguments including their Lambda function to work with thunks. The process is reasonably efficient, but I would prefer Microsoft were to implement a STACK function that would expand any 2D array of thunks; no calculation I could perform would be a match for a native function in compiled code.

PeterBartholomew1
Silver Contributor
Oct 31, 2024
Patrick2788
If you haven't already looked at it, the module EVALTHUNKARRλ from
A version of Excel MAP helper function that will return an array of arrays
was intended to expand any 2D array of thunks representing embedded 2D arrays.

The patterns you illustrate are typical of multidimensional arrays. There, TOCOL will establish a possible order for the coefficients for the multiply-nested indices. WRAPROWS, WRAPCOLS, TRANSPOSE and TOROW (by row or column) will all reshape the array but the cyclic order of the indices remains unaltered. To change that, you either need to use SORT on one of the indices or to reduce the dimensionality using thunks and reshape the overall array with the internal thunk dimensions 'frozen in'.

If this is irrelevant to the challenge you are addressing please ignore it. Either way, I look forward to seeing what you have done.

Forum Discussion