Forum Discussion
Excel Labs Array Module - What are your thoughts?
Sorry for what will seem like a code dump, but I'm curious if any of you have tried to create similar modules or if you have any comments/wisdom to share about the current incarnation of my module. I recently ran it through chatgpt, so I'm not sure if it slipped in any errors - the structure should, however, be largely accurate and give enough details to let you understand what I tried to accomplish.
Do you notice any shortcomings, obvious enhancements, or alternate approaches to the functions? In particular, I am always concerned about the alternate ways to handle array functions and whether my intuition about formula efficiencies is in the right place.
If there is enough interest, I may share some of my other modules.
// arr module
// This module provides a suite of array manipulation functions to enhance and extend Excel's native capabilities.
// The functions are scoped under the `arr.` namespace to prevent naming conflicts with Excel's built-in functions, ensuring reliable use throughout any workbook.
// Function names have been carefully selected to avoid ambiguity or collision with Excel's native features, especially when referenced internally without the `arr.` prefix.
// Below is an overview of the functions provided in this module, organized by their core functionalities:
// 1. Basic Information (Public Interface)
// These functions provide basic array analysis and selection tools.
// - dimensions: Returns the number of rows and columns in an array, optionally including headers.
// - getColumnIdxByName: Retrieves column indices from an array based on header names.
// - uniqueElements: Extracts unique elements from an array, returning them either as a row or column.
// - countsByElement: Counts occurrences of elements in an array with options for ignoring blanks, errors, and sorting.
// 2. Comparisons (Public Interface)
// Functions that allow for comparison between arrays and columns.
// - areEqualDimension: Checks if two arrays have equal dimensions (width, height, or size).
// - compareColumns: Compares columns of an array based on a value and a specified operator.
// - getDiffDimensionFunc: Calculates the difference in dimensions (width, height, or size) between two arrays.
// 3. Miscellaneous Functions (Public Interface)
// General functions for filling arrays and creating values.
// - fillArray: Fills an array with specified text over a defined number of rows and columns.
// 4. Core Operations (Public Interface)
// These high-level array manipulation functions are designed for direct user interaction and support common array tasks.
// Basic Combination and Addition
// - stack: Stack two arrays either vertically or horizontally.
// - stackOn: Stack arrays with user-specified placement (e.g., above, below, left, right).
// - stackAndExpand: Stack two arrays while expanding dimensions to match as needed.
// Subset Selection and Deletion
// - sliceCols: Extract or remove specific columns from an array.
// - sliceRows: Extract or remove specific rows from an array.
// - trimValue: Trim specified values (e.g., blanks) from rows or columns.
// 5. Complex Transformations (Public Interface)
// These functions enable higher-level array manipulations such as flattening, replacing, or transforming data.
// - flatten: Convert a two-dimensional array into a one-dimensional list, with options for sorting and filtering.
// - replaceBlankCells: Replace blank cells in an array with a specified value.
// - replaceCell: Replace specific values in an array based on a condition.
// - replaceCols: Replace or insert entire columns in an array with options to match dimensions.
// - replaceRows: Replace or insert entire rows in an array with options to match dimensions.
// 6. Helper Functions (Internal Use)
// These internal-use functions assist with specific operations and are prefixed with an underscore to denote their private nature.
// Dimension and Size Helpers
// - _areSameHeight: Checks if two arrays have the same height.
// - _areSameWidth: Checks if two arrays have the same width.
// - _areSameSize: Checks if two arrays have the same size.
// - _ensureHeight: Ensure an array has the same or greater height than a reference array.
// - _ensureWidth: Ensure an array has the same or greater width than a reference array.
// - _diffHeight: Calculates the height difference between two arrays.
// - _diffWidth: Calculates the width difference between two arrays.
// - _diffSize: Calculates the size difference (width and height) between two arrays.
// - _maxHeight: Gets the maximum height between two arrays.
// - _maxWidth: Gets the maximum width between two arrays.
// Stacking Logic Helpers
// - _stackSwitch: Determines stacking behavior (e.g., above, below, left, right) based on user input.
// - _stackAndExpandSwitch: Expands dimensions as necessary before stacking based on user preference.
// - _stackAndExpandHeight: Expands and stacks arrays by height.
// - _stackAndExpandWidth: Expands and stacks arrays by width.
// - _stackAndExpandAllDimensions: Expands and stacks arrays in both dimensions (width and height).
// Basic Information
dimensions =
lambda(
target_array,
[show_names_df_FALSE],
if(
if(
isomitted(show_names_df_FALSE),
FALSE,
show_names_df_FALSE
),
vstack(hstack("rows", "columns"), hstack(rows(target_array), columns(target_array))),
hstack(rows(target_array), columns(target_array))
)
);
getColumnIdxByName =
lambda(
array_with_headers,
column_names_row,
hstack(bycol(column_names_row, lambda(column_name, match(column_name, take(array_with_headers,1),0))))
);
uniqueElements =
lambda(
target_array,
[return_as_col_bool_df_TRUE],
trimValue(unique(flatten(target_array, return_as_col_bool_df_TRUE)))
);
countsByElement =
lambda(
target_array,
[search_array_df_SELF],
[show_element_values_df_FALSE],
[ignore_blanks_df_FALSE],
[ignore_errors_df_FALSE],
[sort_elements_df_0],
[traverse_cols_first_df_TRUE],
let(
flattened_target_array, flatten(target_array,,ignore_blanks_df_FALSE,ignore_errors_df_FALSE,,sort_elements_df_0,traverse_cols_first_df_TRUE),
flattened_search_array, if(isomitted(search_array_df_SELF), flattened_target_array, flatten(search_array_df_SELF)),
elements, unique(flattened_target_array),
pre_result,
byrow(
elements,
lambda(
element,
iferror(rows(filter(flattened_search_array, flattened_search_array=element)),0)
)
),
result,
if(
if(
isomitted(show_element_values_df_FALSE),FALSE,show_element_values_df_FALSE
),
hstack(elements, pre_result),
pre_result
),
result
)
);
// Comparisons
areEqualDimension = LAMBDA(dimension, array1, array2,
SWITCH(
dimension,
"width", _areSameWidth(array1, array2),
"height", _areSameHeight(array1, array2),
"size", _areSameSize(array1, array2),
ERROR.TYPE(3)
)
);
compareColumns= LAMBDA(value_row, array_for_comparison, [comparison_operator], [comparison_col_idx], [value_col_idx],
LET(
operator, IF(ISOMITTED(comparison_operator), "=", comparison_operator),
comp_func, mask.comparisonFunc(operator), // getCompFunc will return #VALUE! for invalid operators
col_idx, IF(ISOMITTED(comparison_col_idx), 1, comparison_col_idx),
val_idx, IF(ISOMITTED(value_col_idx), 1, value_col_idx),
comp_value, IF(COLUMNS(value_row) > 1, CHOOSECOLS(value_row, val_idx), value_row),
comp_array, CHOOSECOLS(array_for_comparison, col_idx),
IF(comp_func = ERROR.TYPE(3), ERROR.TYPE(3), comp_func(comp_value, comp_array)) // Propagate #VALUE! if operator is invalid
)
);
getDiffDimensionFunc = LAMBDA(dimension, array1, array2,
SWITCH(
dimension,
"width", _diffWidth(array1, array2),
"height", _diffHeight(array1, array2),
"size", _diffSize(array1, array2),
ERROR.TYPE(3)
)
);
// Miscellaneous functions
fillArray = LAMBDA(r, c, txt, MAKEARRAY(r, c, LAMBDA(row, col, txt)));
// Stack Functions
stack = lambda(array_1, array_2, [vstack_bool_df_TRUE],
if(
if(
isomitted(vstack_bool_df_TRUE),
TRUE,
vstack_bool_df_TRUE
),
vstack(array_1, array_2),
hstack(array_1, array_2)
)
);
stackOn =
lambda(
array_to_stack, fixed_array, [stack_placement_df_RIGHT], [match_shared_dimensions_df_TRUE], [fill_value_df_DBQT],
let(
match_shared_dimension, if(isomitted(match_shared_dimensions_df_TRUE),TRUE,match_shared_dimensions_df_TRUE),
result,
if(
match_shared_dimension,
_stackAndExpandSwitch(array_to_stack, fixed_array, stack_placement_df_RIGHT, fill_value_df_DBQT),
_stackSwitch(array_to_stack, fixed_array, stack_placement_df_RIGHT)
),
result
)
);
stackAndExpand =
lambda(array1, array2, [exp_width_bool_df_TRUE], [fill_value_df_blank], [exp_height_bool_df_TRUE], [vstack_bool_df_TRUE],
let(
expand_width, IF(ISOMITTED(exp_width_bool_df_TRUE), TRUE, exp_width_bool_df_TRUE),
expand_height, IF(ISOMITTED(exp_height_bool_df_TRUE), TRUE, exp_height_bool_df_TRUE),
stack_bool, if(ISOMITTED(vstack_bool_df_TRUE), TRUE, vstack_bool_df_TRUE),
result,
ifs(
expand_height * expand_width,
_stackAndExpandAllDimensions(array1, array2, fill_value_df_blank, stack_bool),
expand_height,
_stackAndExpandHeight(array1, array2, fill_value_df_blank, stack_bool),
expand_width,
_stackAndExpandWidth(array1, array2, fill_value_df_blank, stack_bool),
1,
ERROR.TYPE(3)
),
result
)
);
// Subset selection and Deletion
getColumnsByName =
lambda(
array_with_headers,
column_names_row,
choosecols(drop(array_with_headers,1),getColumnIdxByName(array_with_headers,column_names_row))
);
getNonZeroCells = LAMBDA(target_row_or_col,
LET(is_not_zero, is.notZero(target_row_or_col), FILTER(target_row_or_col, is_not_zero, ""))
);
sliceCols =
LAMBDA(
original_array,
no_columns_to_drop,
[no_of_columns_to_take],
[no_columns_to_drop_from_end],
LET(
after_first_drop, DROP(original_array, , no_columns_to_drop),
after_take,
IF(
ISOMITTED(no_of_columns_to_take),
after_first_drop,
TAKE(after_first_drop, , no_of_columns_to_take)
),
after_second_drop,
IF(
ISOMITTED(no_columns_to_drop_from_end),
after_take,
DROP(after_take, ,-no_columns_to_drop_from_end)
),
after_second_drop
)
);
sliceRows =
LAMBDA(
original_array,
no_rows_to_drop,
[no_rows_to_take],
[no_rows_to_drop_from_end],
LET(
after_first_drop, DROP(original_array, no_rows_to_drop),
after_take,
IF(
ISOMITTED(no_rows_to_take),
after_first_drop,
TAKE(after_first_drop, no_rows_to_take)
),
after_second_drop,
IF(
ISOMITTED(no_rows_to_drop_from_end),
after_take,
DROP(after_take, ,-no_rows_to_drop_from_end)
),
after_second_drop
)
);
trimValue =
lambda(
target_row_or_col,
[trim_value_df_BLANK],
let(
trim_mask,
if(
isomitted(trim_value_df_BLANK),
not(isblank(target_row_or_col)),
not(target_row_or_col = trim_value_df_BLANK)
),
filter(target_row_or_col, trim_mask,"")
)
);
// Complex Transformations
flatten = LAMBDA(
target_array,
[return_as_column_bool_df_TRUE],
[ignore_blanks_df_FALSE],
[ignore_errors_df_FALSE],
[unique_elements_only_df_FALSE],
[sort_elements_df_0],
[traverse_cols_first_df_TRUE],
LET(
make_column_bool,
IF(ISOMITTED(return_as_column_bool_df_TRUE), TRUE, return_as_column_bool_df_TRUE),
ignore_blanks,
IF(ISOMITTED(ignore_blanks_df_FALSE), FALSE, ignore_blanks_df_FALSE),
ignore_errors,
IF(ISOMITTED(ignore_errors_df_FALSE), FALSE, ignore_errors_df_FALSE),
ignore_value,
(ignore_blanks * 1) + (ignore_errors * 2),
traverse_cols_first,
if(isomitted(traverse_cols_first_df_TRUE),TRUE,traverse_cols_first_df_TRUE),
pre_result,
IF(
make_column_bool,
TOCOL(target_array, ignore_value, traverse_cols_first),
TOROW(target_array, ignore_value, traverse_cols_first)
),
unique_elements_only_bool,
if(isomitted(unique_elements_only_df_FALSE), FALSE, unique_elements_only_df_FALSE),
sort_elements_value,
if(isomitted(sort_elements_df_0), 0, sort_elements_df_0),
after_unique_result,
if(unique_elements_only_bool, unique(pre_result), pre_result),
after_sort_result,
switch(
sort_elements_value,
0,
after_unique_result,
1,
sort(after_unique_result),
-1,
sort(after_unique_result,, -1),
error.type(3)
),
after_sort_result
)
);
replaceBlankCells =
LAMBDA(
array,
[replacement_value],
MAP(
array,
LAMBDA(
cur_cell,
IF(
ISBLANK(cur_cell),
IF(ISOMITTED(replacement_value), "", replacement_value),
cur_cell
)
)
)
);
replaceCell =
LAMBDA(
array,
target_cell_value,
replacement_value,
[comparison_operator],
MAP(
array,
LAMBDA(
cur_cell_value,
let(
comparison_func,
IF(
ISOMITTED(comparison_operator),
mask.comparisonFunc("="),
mask.comparisonFunc(comparison_operator)
),
comparison_result, comparison_func(cur_cell_value, target_cell_value),
if(
comparison_result,
replacement_value,
target_cell_value
)
)
)
)
);
replaceCols =
LAMBDA(
replacement_cols,
original_array,
[target_col_idx],
[insert_bool_default_false],
[trim_to_orig_size_bool_df_FALSE],
[expand_replacement_cols_to_match_rows_df_TRUE],
[expand_original_cols_to_match_rows_df_TRUE],
LET(
col_idx, IF(ISOMITTED(target_col_idx), 1, target_col_idx),
orig_cols, columns(original_array),
insert_bool,
IF(
ISOMITTED(insert_bool_default_false),
FALSE,
insert_bool_default_false
),
adj_orig_array,
if(
if(
isomitted(expand_original_cols_to_match_rows_df_TRUE),
TRUE,
expand_original_cols_to_match_rows_df_TRUE
),
_ensureHeight(replacement_cols,original_array),
original_array
),
adj_replacement_cols,
if(
if(
isomitted(expand_replacement_cols_to_match_rows_df_TRUE),
TRUE,
expand_replacement_cols_to_match_rows_df_TRUE
),
_ensureHeight(original_array,replacement_cols),
replacement_cols
),
first_part,
IF(
col_idx > 1,
hSTACK(TAKE(adj_orig_array, ,col_idx - 1), adj_replacement_cols),
adj_replacement_cols
),
drop_cols,
if(
orig_cols>=col_idx,
if(
insert_bool,
col_idx-1,
col_idx+columns(adj_replacement_cols)-1
),
0
),
combined_parts,
IF(
or(drop_cols=0,drop_cols>orig_cols),
first_part,
hstack(first_part, drop(adj_orig_array, ,drop_cols))
),
if(
if(
isomitted(trim_to_orig_size_bool_df_FALSE),
FALSE,
trim_to_orig_size_bool_df_FALSE
),
take(combined_parts, ,orig_cols),
combined_parts
)
)
);
replaceRows =
LAMBDA(
replacement_rows,
original_array,
[target_row_idx],
[insert_bool_df_false],
[trim_to_orig_size_bool_df_FALSE],
[expand_replacement_rows_to_match_cols_df_TRUE],
[expand_original_rows_to_match_cols_df_TRUE],
LET(
row_idx, IF(ISOMITTED(target_row_idx), 1, target_row_idx),
orig_rows, rows(original_array),
insert_bool,
IF(
ISOMITTED(insert_bool_df_false),
FALSE,
insert_bool_df_false
),
adj_orig_array,
if(
if(
isomitted(expand_original_rows_to_match_cols_df_TRUE),
TRUE,
expand_original_rows_to_match_cols_df_TRUE
),
_ensureWidth(replacement_rows, original_array),
original_array
),
adj_replacement_rows,
if(
if(
isomitted(expand_replacement_rows_to_match_cols_df_TRUE),
TRUE,
expand_replacement_rows_to_match_cols_df_TRUE
),
_ensureWidth(original_array,replacement_rows),
replacement_rows
),
first_part,
IF(
row_idx > 1,
VSTACK(TAKE(adj_orig_array, row_idx - 1), adj_replacement_rows),
adj_replacement_rows
),
drop_rows,
if(
rows(adj_orig_array)>=row_idx,
if(
insert_bool,
row_idx-1,
row_idx+rows(adj_replacement_rows)-1
),
0
),
combined_parts,
IF(
drop_rows<=0,
first_part,
vstack(first_part, drop(adj_orig_array, drop_rows))
),
result,
if(
if(
isomitted(trim_to_orig_size_bool_df_FALSE),
FALSE,
trim_to_orig_size_bool_df_FALSE
),
take(combined_parts, orig_rows),
combined_parts
),
result
)
);
// Dimension and Size Helpers
_areSameHeight = LAMBDA(array1, array2,
ROWS(array1) = ROWS(array2)
);
_areSameWidth = LAMBDA(array1, array2,
COLUMNS(array1) = COLUMNS(array2)
);
_areSameSize = LAMBDA(array1, array2,
AND(_areSameWidth(array1, array2), _areSameHeight(array1, array2))
);
_ensureHeight =
lambda(
reference_array,
expansion_array,
[fill_value_df_DBLQT],
expand(
expansion_array,
max(rows(reference_array), rows(expansion_array)),,
if(isomitted(fill_value_df_DBLQT), "",fill_value_df_DBLQT)
)
);
_ensureWidth =
lambda(
reference_array,
expansion_array,
[fill_value_df_DBLQT],
expand(
expansion_array, ,
max(columns(reference_array), columns(expansion_array)),
if(isomitted(fill_value_df_DBLQT), "",fill_value_df_DBLQT)
)
);
_diffHeight = LAMBDA(array1, array2,
ROWS(array1) - ROWS(array2)
);
_diffWidth = LAMBDA(array1, array2,
COLUMNS(array1) - COLUMNS(array2)
);
_diffSize = LAMBDA(array1, array2,
HSTACK(_diffHeight(array1, array2), _diffWidth(array1, array2))
);
_maxHeight = LAMBDA(arr_1, arr_2,
LET(
arr_1_height, ROWS(arr_1),
arr_2_height, ROWS(arr_2),
max_height, MAX(arr_1_height, arr_2_height),
max_height
)
);
_maxWidth = LAMBDA(arr_1, arr_2,
LET(
arr_1_width, COLUMNS(arr_1),
arr_2_width, COLUMNS(arr_2),
max_width, MAX(arr_1_width, arr_2_width),
max_width
)
);
// Stacking Logic Helpers
_stackSwitch =
lambda(
array_to_stack, fixed_array, stack_placement_df_RIGHT,
switch(
if(isomitted(stack_placement_df_RIGHT),"right",stack_placement_df_RIGHT),
"above",
vstack(array_to_stack, fixed_array),
"below",
vstack(fixed_array, array_to_stack),
"left",
hstack(array_to_stack, fixed_array),
"right",
hstack(fixed_array, array_to_stack),
error.type(3)
)
);
_stackAndExpandSwitch =
lambda(
array_to_stack, fixed_array, stack_placement_df_RIGHT, [fill_value_df_DBQT],
switch(
if(isomitted(stack_placement_df_RIGHT),"right",stack_placement_df_RIGHT),
"above",
_stackAndExpandWidth(array_to_stack, fixed_array,fill_value_df_DBQT),
"below",
_stackAndExpandWidth(fixed_array, array_to_stack, fill_value_df_DBQT),
"left",
_stackAndExpandHeight(array_to_stack, fixed_array,fill_value_df_DBQT),
"right",
_stackAndExpandHeight(fixed_array, array_to_stack, fill_value_df_DBQT),
error.type(3)
)
);
_stackAndExpandHeight =
LAMBDA(array_1, array_2, [fill_value_df_blank], [vstack_bool_df_FALSE],
LET(
max_width, _maxWidth(array_1, array_2),
max_height, _maxHeight(array_1, array_2),
fill_char, IF(ISOMITTED(fill_value_df_blank), "", fill_value_df_blank),
stack_bool,
if(
isomitted(vstack_bool_df_FALSE),
FALSE,
vstack_bool_df_FALSE
),
expanded_array_1, EXPAND(array_1, max_height, , fill_char),
expanded_array_2, EXPAND(array_2, max_height, , fill_char),
stack(expanded_array_1, expanded_array_2, stack_bool)
)
);
_stackAndExpandWidth =
LAMBDA(array_1, array_2, [fill_value_df_blank], [vstack_bool_df_TRUE],
LET(
max_width, _maxWidth(array_1, array_2),
max_height, _maxHeight(array_1, array_2),
fill_char, IF(ISOMITTED(fill_value_df_blank), "", fill_value_df_blank),
stack_bool,
if(
isomitted(vstack_bool_df_TRUE),
FALSE,
vstack_bool_df_TRUE
),
expanded_array_1, EXPAND(array_1, , max_width, fill_char),
expanded_array_2, EXPAND(array_2, , max_width, fill_char),
stack(expanded_array_1, expanded_array_2, stack_bool)
)
);
_stackAndExpandAllDimensions =
LAMBDA(array_1, array_2, [fill_value_df_blank], [vstack_bool_df_TRUE],
LET(
max_width, _maxWidth(array_1, array_2),
max_height, _maxHeight(array_1, array_2),
fill_char, IF(ISOMITTED(fill_value_df_blank), "", fill_value_df_blank),
stack_bool,
if(
isomitted(vstack_bool_df_TRUE),
TRUE,
vstack_bool_df_TRUE
),
expanded_array_1, EXPAND(array_1, max_height, max_width, fill_char),
expanded_array_2, EXPAND(array_2, max_height, max_width, fill_char),
if(stack_bool, vstack(expanded_array_1, expanded_array_2), hstack(expanded_array_1, expanded_array_2))
)
);
Without playing with samples it's hard to comment concrete functions. First in mind, perhaps we may avoid MAP and FILTER in some cases, they are costly from performance point of view. For example,
replaceBlankCells = LAMBDA( array, [replacement_value], LET( repl, IF(ISOMITTED(replacement_value), "", replacement_value), IF( ISBLANK(array), repl, array) ) );
For some could be variants which are bit easier in maintenance, at least from my point of view. e.g.
sliceCols = LAMBDA( original_array, no_columns_to_drop, [no_of_columns_to_take], [no_columns_to_drop_from_end], LET( n, COLUMNS(original_array), start, no_columns_to_drop + 1, cols, MIN(no_of_columns_to_take, n - start + 1) - no_columns_to_drop_from_end, CHOOSECOLS(original_array, SEQUENCE(,cols,start)) ) );
Anyway, set of functions could be useful, thank you for sharing.
- joelb95Brass Contributor
Thanks for taking a look. These functions are designed for working with dynamic arrays, so the array traversal logic is embedded (usually) in each of them. My replaceCell and replaceBlankCells, for instance, can replace any cell in an array (dynamic or otherwise) without having to separately call an array traversal function. Yes, it involves constructing a new array, but as far as I know, each "discrete" calculation creates its own array anyway (e.g. original_array, extra_row(vstack(original_array,"")), extra_row2(vstack(extra_row,"")) creates two new arrays while original_array, vstack(original_array, extra_row, extra_row2) creates only one new array).
The one function that I think is the biggest contributor going forward is the flatten function. It is presumably going to be the basis of all of my set theory module logic.