Weird Results from Array Manipulation - Any ideas on why?

Brass Contributor

Apr 14, 2024

Thank you both for replying. I think djclementes did a better job of mind reading what I wanted in response to a vague question. I might be overreading the response, but it almost sounds like choose is a sort of syntactic sugar around "if". If true, this may be the end of my efforts around what I have been doing involving trying to freely pass arrays from a choose "encapsulation" (multi-dimensional array as one possible choice in a choose lambda) and a "stuffed" array (multi-dimensional array as only possible choice in a choose lambda) in a single dimensional array (like a column) that is then "stuffed" into a list.

Basically anything of the form lambda(a1, ...,a10, lambda(choice, choose(choice, a1, ... , a10)) lets you pass around any desired arrays without "flattening" them. (What I am calling encapsulation.) The current explanation about passing an array to the "choice" variable in a choose function means that even if they are not flattened when stored, there are still ways to inadvertently get excel to flatten them. I'm not quite sure how passing a sequence as the choice variable could result in a jagged array, but I am sensing that an array function is done more "at the same time" than I was imagining.

I'm still not sure why things would be evaluated (and compared) prior to the full return of the function, i.e.

The first array value of 1 (in the bycolumn or sequence(2) reference) should create this formula:

lambda(choice, choose(choice, many(1)))

Shouldn't this mean that content of the sequence/bycol returns is an array of dimensionless functions? e.g. {lambda(x, choose(x, "b")); lambda(x, choose(x, sequence(4,1,5))}

And shouldn't picking out one such function, e.g. index(row_of_functions, 1,1) just return a dimensionless function?

Why would excel know/care about what the content of a non-evaluated function is when constructing the row of functions?

djclements
Silver Contributor
Apr 15, 2024
joelb95 You are correct in your assessment that an undefined LAMBDA function returns nothing until it is called/evaluated. If it is never called, it will return the #CALC! error with the following screen tip:

"Cell contains a lambda... You can't define a lambda in a cell. Define your lambda in the Name Manager and try again using the name."

This is of course misleading, because a custom LAMBDA function can be defined directly in a cell using the LET function.

On its own, the formula =IF({1,0}, "a", LAMBDA(x, SEQUENCE(x))) returns {"a",#CALC!} in a 1 row by 2 column array. Using INDEX(array, 1) will return "a", and INDEX(array, 2) will return #CALC!; however, behind the #CALC! error sits the undefined function, LAMBDA(x, SEQUENCE(x)). When you call the function using INDEX(array, 2)(3), it is evaluated and returns {1;2;3}.

When looking at your original BYCOL example with the custom MANY function:

=LET( many, LAMBDA(choice, CHOOSE(choice + 1, 10, "b", "harry", "c", SEQUENCE(4, 1, 5), SEQUENCE(3), 45, "d", "lots", "e", "ten")), many_in_a_row, BYCOL(SEQUENCE(1, many(0)), LAMBDA(c, LAMBDA(choice, CHOOSE(choice, many(c))))), INDEX(many_in_a_row, 1)(1) )

On its own, the many_in_a_row variable would return {#CALC!,#CALC!,...} in a 1 row by 10 column array. Behind each of those #CALC! errors, though, sits the undefined/uncalled function, LAMBDA(choice, CHOOSE(choice, many(c))), for each column in the array. However, the problem with the BYCOL function is that instead of passing a single value to the many function [many(1), many(2), etc.], it's actually passing an array for the entire column [many({1}), many({2}), etc.].

INDEX(many_in_a_row, 1)(1) returns {"b";"b";"b";"b"}, not because the other columns of BYCOL were evaluated (they weren't), but rather because an array object is being passed to the index_num argument of the underlying CHOOSE function on which the many function was based.

To further illustrate this point, swap out BYCOL with MAP:

=LET( many, LAMBDA(choice, CHOOSE(choice + 1, 10, "b", "harry", "c", SEQUENCE(4, 1, 5), SEQUENCE(3), 45, "d", "lots", "e", "ten")), many_in_a_row, MAP(SEQUENCE(1, many(0)), LAMBDA(c, LAMBDA(choice, CHOOSE(choice, many(c))))), INDEX(many_in_a_row, 1)(1) )

With the MAP function, each iteration passes a single value to the many function [many(1), many(2), etc.]. As a result, INDEX(many_in_a_row, 1)(1) now returns "b", and INDEX(many_in_a_row, 4)(1) returns {5;6;7;8}.

TIP: regarding the INDEX function, when referencing a one-dimensional array or range that contains a single row, the row_num argument automatically becomes the column_num, if the optional [column_num] argument is not used. For example, INDEX(many_in_a_row, 4) returns the 4th column because the array only contains 1 row.
- joelb95
  Brass Contributor
  Apr 15, 2024
  "With the MAP function, each iteration passes a single value to the many function [many(1), many(2), etc.]. As a result, INDEX(many_in_a_row, 1)(1) now returns "b", and INDEX(many_in_a_row, 4)(1) returns {5;6;7;8}."
  
  I haven't tested it yet, but this bit is fascinating. Not that I know what any of these words mean, but is the idea that the byrow/bycol are truly vectorized functions that somehow are running array logic rather than just itered scalar functions? I am not really sophisticated enough to implement the various timing functions I've seen, so I don't know how to answer the question, but do byrow/bycolumn have meaningful efficiencies over mapping a one by X row/column?
  
  Maybe if I just use MAP I'll solve my problems, but if it comes at the cost of a huge loss of efficiency, it may not be all that helpful in the long run.
  
  At some point I'll try to put into words what the bigger project is, but one portion relates to dynamically passing parameters and collecting the array results for later (or perhaps immediate) use. If I have a list of 10,000 things, I would, of course, prefer to use vectors whenever possible - but if what you say is so, then it may be a fool's errand.
  
  Picture, if you will, a simple lambda that you want to extend indefinitely in a way similar to how =sum() can take as many arguments as you care to enter. How do you get an array of arrays to be properly "parameterized"? My current effort (and the genesis of this post) was about the weird result when I try to use byrow/bycolumn as a native excel iter counter. If it would just return the unevaluated formula, it would work swimmingly, but if it is going to return flattened arrays (however it gets there), it is unworkable.

joelb95

Brass Contributor

Apr 14, 2024

djclements SergeiBaklan

Consider, for instance, this formula that does not conflate the array creation with evaluation of the embedded function:

=LET(
array,IF({TRUE,FALSE}, "a", LAMBDA(x, SEQUENCE(x))), 
evaluate_embedded_array_formula, INDEX(array, 1, 2)(3),
return_singleton, INDEX(array, 1,1),
HSTACK(return_singleton, evaluate_embedded_array_formula))

Forum Discussion

Weird Results from Array Manipulation - Any ideas on why?

Resources