Weird Results from Array Manipulation - Any ideas on why?

Brass Contributor

Hi all - doing some exploring with lets and lambda in excel and passing arrays around.  I found a weird quirk where something that should work doesn't work as expected in a surprising way.  This problem also (or primarily) appears with byrow.

 

This formula results in the following results:

 

 

 

 

 

=LET(
    many, LAMBDA(choice,
        CHOOSE(choice + 1, 10, "b", "harry", "c", SEQUENCE(4, 1, 5), SEQUENCE(3), 45, "d", "lots", "e", "ten")
    ),
    INDEX(LAMBDA(x, CHOOSE(x, (many(SEQUENCE(1, 2))))), 1, 1)(1)
)

 

 

 

 

 

 

bharry
bharry
bharry
b

harry

 

Anyone have a guess why?

If you expand the sequence out to 10, it gives this:

 

bharryc5145dlotseten
bharryc6245dlotseten
bharryc7345dlotseten
bharryc8#N/A45dlotseten

 

This is the result for a more direct mapping:

 

 

 

 

 

=LET(
    many, LAMBDA(choice,
        CHOOSE(choice + 1, 10, "b", "harry", "c", SEQUENCE(4, 1, 5), SEQUENCE(3), 45, "d", "lots", "e", "ten")
    ),
    HSTACK(many(1), many(2), many(3), many(4))
)

 

 

 

 

 

bharryc5
#N/A#N/A#N/A6
#N/A#N/A#N/A7
#N/A#N/A#N/A8

 

This is a bycol example:

 

 

 

 

 

=LET(
    many, LAMBDA(choice,
        CHOOSE(choice + 1, 10, "b", "harry", "c", SEQUENCE(4, 1, 5), SEQUENCE(3), 45, "d", "lots", "e", "ten")
    ),
    many_in_a_row, BYCOL(SEQUENCE(1, many(0)), LAMBDA(r, LAMBDA(choice, CHOOSE(choice, many(r))))),
    INDEX(many_in_a_row, 1, 1)(1)
)

 

 

 

 

 

b
b
b
b

 

8 Replies

@joelb95 

Let start from first one

=LET(
    many, LAMBDA(choice,
        CHOOSE(
            choice + 1,
            10,
            "b",
            "harry",
            "c",
            SEQUENCE(4, 1, 5),
            SEQUENCE(3),
            45,
            "d",
            "lots",
            "e",
            "ten"
        )
    ),
    INDEX(
        LAMBDA(x, CHOOSE(x, (many(SEQUENCE(1, 2))))), 1, 1)(1)
)

INDEX returns the pointer on LAMBDA, (1,1) is the only possible combination. Since LAMBDA is first class function we may pass parameter to it after that. Thus that's equivalent of

LAMBDA(x, CHOOSE(x, (many(SEQUENCE(1, 2)))))(1)

If we would like to take something from returned by LAMBDA array, that could be like

=LET(
    many, LAMBDA(choice,
        CHOOSE(
            choice + 1,
            10,
            "b",
            "harry",
            "c",
            SEQUENCE(4, 1, 5),
            SEQUENCE(3),
            45,
            "d",
            "lots",
            "e",
            "ten"
        )
    ),
    INDEX( LAMBDA(x, CHOOSE(x, (many(SEQUENCE(1, 2)))))(1), 1, 1)
)

@joelb95 

Second one

=LET(
    many, LAMBDA(choice,
        CHOOSE(
            choice + 1,
            10,
            "b",
            "harry",
            "c",
            SEQUENCE(4, 1, 5),
            SEQUENCE(3),
            45,
            "d",
            "lots",
            "e",
            "ten"
        )
    ),
    HSTACK(many(1), many(2), many(3), many(4))
)

works as expected. First 3 return single values, many(4) returns 4-rows array. Combining by HSTACK single value and an array doesn't expand single value to the rest of rows, it returns #N/A for the rows which doesn't exist in the specific column.

However, I'm not sure which result do you expect.

@joelb95 

BYCOL. Not sure what you'd like to receive. In general we have array of arrays which is not supported in Excel natively. There are different workarounds, for example

=LET(
    many, LAMBDA(choice,
        CHOOSE(
            choice + 1,
            10,
            "b",
            "harry",
            "c",
            SEQUENCE(4, 1, 5),
            SEQUENCE(3),
            45,
            "d",
            "lots",
            "e",
            "ten"
        )
    ),
    many_in_a_row, REDUCE(
        "",
        SEQUENCE(1, many(0)),
        LAMBDA(a,v, HSTACK(a, many(v)))),
    IFNA(DROP(many_in_a_row, , 1), "")
)

which returns

image.png

From here you may take by INDEX desired value.

@joelb95 Here's some additional information you may find helpful or interesting...

 

(1) The CHOOSE Function:

 

The CHOOSE function handles arrays in very much the same way as the IF function. To illustrate this point, consider the following examples:

 

=IF(TRUE, "a", SEQUENCE(3))

 

Returns:
a

 

=IF(FALSE, "a", SEQUENCE(3))

 

Returns:
1
2
3

 

=IF({TRUE}, "a", SEQUENCE(3))

 

Returns:
a
a
a

 

=IF({TRUE,FALSE}, "a", SEQUENCE(3))

 

Returns: 
a1
a2
a3

 

When a single logical_test is performed, the corresponding result is returned as expected. However, when an array object is passed to the logical_test argument (even if that array only contains a single value), Excel sees this as an attempt to generate multiple sets of results for each item in the array. As such, in order to eliminate the possibility of generating a "jagged array" of results, the value_if_true and value_if_false arguments are automatically resized to be the same dimensions (ie: an argument containing a single value is repeated to fill the same number of rows/columns as the argument with the greatest number of rows/columns).

 

This same behavior can also be observed with the CHOOSE function:

 

=CHOOSE(1, "a", SEQUENCE(3), SEQUENCE(2))

 

Returns:
a

 

=CHOOSE(3, "a", SEQUENCE(3), SEQUENCE(2))

 

Returns:
1
2

 

=CHOOSE({1}, "a", SEQUENCE(3), SEQUENCE(2))

 

Returns:
a
a
a

 

=CHOOSE({1,3}, "a", SEQUENCE(3), SEQUENCE(2))

 

Returns: 
a1
a2
a#N/A

 

In the examples shown above, out of the three possible value arguments to be returned, SEQUENCE(3) contains the greatest number of rows. When an array object is then passed to the index_num argument, the other two value arguments are automatically resized with the same number of rows as SEQUENCE(3). In order to prevent a "jagged array" of results from being returned, "a" is repeated 3 times to fill 3 rows and SEQUENCE(2) is resized to fit 3 rows. Note: SEQUENCE(2) cannot be repeated 3 times because it already contains multiple rows, which would overlap each other if repeated. Instead, it is "expanded" to fit 3 rows, with the additional rows returning #N/A.

 

(2) The BYROW and BYCOL Functions:

 

  • BYROW and BYCOL cannot produce an array of arrays (nested arrays are not supported)
  • the LAMBDA function applied to each row or column must produce a single result
  • each iteration references an entire row or column (even when using a one-dimensional array)

 

For example, with =BYCOL(SEQUENCE(1, 3), LAMBDA(c, ...)), the SEQUENCE(1, 3) function returns an array object consisting of {1,2,3}. While you might expect each iteration of the c variable to return a single value from 1 to 3, it actually returns another array consisting of a single value from {1} to {3}, which represents an entire column in the array. As discussed previously, when an array object is then passed to the index_num argument of the CHOOSE function, multiple results can be returned (as is the case with your custom MANY function).

 

Alternatively, the MAP function can be better suited for working with one-dimensional arrays, because the LAMBDA function is applied to each value in the array individually. For example, with =MAP(SEQUENCE(1, 3), LAMBDA(n, ...)), each iteration of the n variable will in fact return a single value from 1 to 3.

 

(3) Alternative Methods (Array of Arrays):

 

MAP with TEXTJOIN, plus TEXTBEFORE and TEXTAFTER:

 

=LET(
    many, LAMBDA(choice, CHOOSE(choice + 1, 10, "b", "harry", "c", SEQUENCE(4, 1, 5), SEQUENCE(3), 45, "d", "lots", "e", "ten")),
    arr, MAP(SEQUENCE(1, many(0)), LAMBDA(n, TEXTJOIN("|", FALSE, "", many(n), ""))),
    IFERROR(TEXTBEFORE(TEXTAFTER(arr, "|", SEQUENCE(ROWS(many({0})))), "|"), "")
)

 

MAKEARRAY with INDEX:

 

=LET(
    many, LAMBDA(choice, CHOOSE(choice + 1, 10, "b", "harry", "c", SEQUENCE(4, 1, 5), SEQUENCE(3), 45, "d", "lots", "e", "ten")),
    IFERROR(MAKEARRAY(ROWS(many({0})), many(0), LAMBDA(r,c, INDEX(many(c), r))), "")
)

 

REDUCE with HSTACK or VSTACK:

 

=LET(
    many, LAMBDA(choice, CHOOSE(choice + 1, 10, "b", "harry", "c", SEQUENCE(4, 1, 5), SEQUENCE(3), 45, "d", "lots", "e", "ten")),
    IFNA(REDUCE(many(1), SEQUENCE(1, many(0) - 1, 2), LAMBDA(a,v, HSTACK(a, many(v)))), "")
)

 

@djclements @Sergei Baklan 

 

Thank you both for replying. I think djclementes did a better job of mind reading what I wanted in response to a vague question. I might be overreading the response, but it almost sounds like choose is a sort of syntactic sugar around "if". If true, this may be the end of my efforts around what I have been doing involving trying to freely pass arrays from a choose "encapsulation" (multi-dimensional array as one possible choice in a choose lambda) and a "stuffed" array (multi-dimensional array as only possible choice in a choose lambda) in a single dimensional array (like a column) that is then "stuffed" into a list.

Basically anything of the form lambda(a1, ...,a10, lambda(choice, choose(choice, a1, ... , a10)) lets you pass around any desired arrays without "flattening" them. (What I am calling encapsulation.) The current explanation about passing an array to the "choice" variable in a choose function means that even if they are not flattened when stored, there are still ways to inadvertently get excel to flatten them. I'm not quite sure how passing a sequence as the choice variable could result in a jagged array, but I am sensing that an array function is done more "at the same time" than I was imagining.

I'm still not sure why things would be evaluated (and compared) prior to the full return of the function, i.e.

The first array value of 1 (in the bycolumn or sequence(2) reference) should create this formula:

lambda(choice, choose(choice, many(1)))

Shouldn't this mean that content of the sequence/bycol returns is an array of dimensionless functions? e.g. {lambda(x, choose(x, "b")); lambda(x, choose(x, sequence(4,1,5))}

And shouldn't picking out one such function, e.g. index(row_of_functions, 1,1) just return a dimensionless function?

Why would excel know/care about what the content of a non-evaluated function is when constructing the row of functions?

@djclements @SergeiBaklan 

 

Consider, for instance, this formula that does not conflate the array creation with evaluation of the embedded function:

 

=LET(
array,IF({TRUE,FALSE}, "a", LAMBDA(x, SEQUENCE(x))), 
evaluate_embedded_array_formula, INDEX(array, 1, 2)(3),
return_singleton, INDEX(array, 1,1),
HSTACK(return_singleton, evaluate_embedded_array_formula))

 

@joelb95 You are correct in your assessment that an undefined LAMBDA function returns nothing until it is called/evaluated. If it is never called, it will return the #CALC! error with the following screen tip:

 

"Cell contains a lambda... You can't define a lambda in a cell. Define your lambda in the Name Manager and try again using the name."

 

This is of course misleading, because a custom LAMBDA function can be defined directly in a cell using the LET function.

 

On its own, the formula =IF({1,0}, "a", LAMBDA(x, SEQUENCE(x))) returns {"a",#CALC!} in a 1 row by 2 column array. Using INDEX(array, 1) will return "a", and INDEX(array, 2) will return #CALC!; however, behind the #CALC! error sits the undefined function, LAMBDA(x, SEQUENCE(x)). When you call the function using INDEX(array, 2)(3), it is evaluated and returns {1;2;3}.

 

When looking at your original BYCOL example with the custom MANY function:

 

=LET(
    many, LAMBDA(choice, CHOOSE(choice + 1, 10, "b", "harry", "c", SEQUENCE(4, 1, 5), SEQUENCE(3), 45, "d", "lots", "e", "ten")),
    many_in_a_row, BYCOL(SEQUENCE(1, many(0)), LAMBDA(c, LAMBDA(choice, CHOOSE(choice, many(c))))),
    INDEX(many_in_a_row, 1)(1)
)

 

On its own, the many_in_a_row variable would return {#CALC!,#CALC!,...} in a 1 row by 10 column array. Behind each of those #CALC! errors, though, sits the undefined/uncalled function, LAMBDA(choice, CHOOSE(choice, many(c))), for each column in the array. However, the problem with the BYCOL function is that instead of passing a single value to the many function [many(1), many(2), etc.], it's actually passing an array for the entire column [many({1}), many({2}), etc.].

 

INDEX(many_in_a_row, 1)(1) returns {"b";"b";"b";"b"}, not because the other columns of BYCOL were evaluated (they weren't), but rather because an array object is being passed to the index_num argument of the underlying CHOOSE function on which the many function was based.

 

To further illustrate this point, swap out BYCOL with MAP:

 

=LET(
    many, LAMBDA(choice, CHOOSE(choice + 1, 10, "b", "harry", "c", SEQUENCE(4, 1, 5), SEQUENCE(3), 45, "d", "lots", "e", "ten")),
    many_in_a_row, MAP(SEQUENCE(1, many(0)), LAMBDA(c, LAMBDA(choice, CHOOSE(choice, many(c))))),
    INDEX(many_in_a_row, 1)(1)
)

 

With the MAP function, each iteration passes a single value to the many function [many(1), many(2), etc.]. As a result, INDEX(many_in_a_row, 1)(1) now returns "b", and INDEX(many_in_a_row, 4)(1) returns {5;6;7;8}.

 

TIP: regarding the INDEX function, when referencing a one-dimensional array or range that contains a single row, the row_num argument automatically becomes the column_num, if the optional [column_num] argument is not used. For example, INDEX(many_in_a_row, 4) returns the 4th column because the array only contains 1 row.

"With the MAP function, each iteration passes a single value to the many function [many(1), many(2), etc.]. As a result, INDEX(many_in_a_row, 1)(1) now returns "b", and INDEX(many_in_a_row, 4)(1) returns {5;6;7;8}."

I haven't tested it yet, but this bit is fascinating. Not that I know what any of these words mean, but is the idea that the byrow/bycol are truly vectorized functions that somehow are running array logic rather than just itered scalar functions? I am not really sophisticated enough to implement the various timing functions I've seen, so I don't know how to answer the question, but do byrow/bycolumn have meaningful efficiencies over mapping a one by X row/column?

Maybe if I just use MAP I'll solve my problems, but if it comes at the cost of a huge loss of efficiency, it may not be all that helpful in the long run.

At some point I'll try to put into words what the bigger project is, but one portion relates to dynamically passing parameters and collecting the array results for later (or perhaps immediate) use. If I have a list of 10,000 things, I would, of course, prefer to use vectors whenever possible - but if what you say is so, then it may be a fool's errand.

Picture, if you will, a simple lambda that you want to extend indefinitely in a way similar to how =sum() can take as many arguments as you care to enter. How do you get an array of arrays to be properly "parameterized"? My current effort (and the genesis of this post) was about the weird result when I try to use byrow/bycolumn as a native excel iter counter. If it would just return the unevaluated formula, it would work swimmingly, but if it is going to return flattened arrays (however it gets there), it is unworkable.