ByRow function produces two different results depending on whether row is range or array

Brass Contributor

In a nutshell, I am trying to compute average with a combination of ByRow, Index, and Average. I needed a sequence number array for this algorithm. When I put this sequence number in a range and use that range in ByRow function, it works great. But when I use that sequence function directly inside a Let(), it gives me wrong answers.

 

I am sharing the codes I have tried, below:

 

This is the first formula I wrote:

 

 

 

 

=LET(
    Values, SORTBY(Data[Sensex], Data[DATE]),
    Sqn, SEQUENCE(ROWS(Values)),
    BYROW(
        Sqn,
        LAMBDA(a,
            IF(
                a < 7,
                "NA",
                AVERAGE(
                    INDEX(Values, SEQUENCE(7, 1, a - 7 + 1))
                )
            )
        )
    )
)​

 

 

 

 

This one produced "a" set of answers. But the answers were slightly incorrect.  To figure out where the problem was, all I did was put the sequences in a separate range and use that range in the ByRow function and this time I got perfect results:

 

So, this is the first part to get the sequence of numbers

 

 

 

In cell, say E4

=LET(
    Values, SORTBY(Data[Sensex], Data[DATE]),
    Sqn, SEQUENCE(ROWS(Values)),
    Sqn
)

 

 

 

 

And this is the second part, where I use the ByRow function and specify the range as an array

 

 

 

 

In cell, say F4

=LET(
    Values, SORTBY(Data[S&P BSE Sensex], Data[DATE]),
    BYROW(
        E4#,
        LAMBDA(a,
            IF(
                a < 7,
                "NA",
                AVERAGE(
                    INDEX(Values, SEQUENCE(7, 1, a - 7 + 1))
                )
            )
        )
    )
)

 

 

 

 

And now I am getting the perfect answer. Is this a bug or am I doing something wrong here:

   @Peter Bartholomew @Craig Hatmaker  @Sergei Baklan @lori_m

 

24 Replies

I don't have an answer why right now but your 'wrong' answer are just your original values NOT averaged offset by 7. That is probably a good clue what is happening.

One more clue: This variation also works:

=LET(Values, SORTBY(Data[S&P BSE Sensex],Data[DATE]),
     Sqn, SEQUENCE(ROWS(Values)),
     Seven, SEQUENCE(7),
     BYROW(Sqn, LAMBDA(a,
         IF(a<7,"NA", AVERAGE(INDEX(Values, a-7+Seven))))))

Basically setting up the {1,2,3,4,5,6,7} array ahead of time and then simple math inside

That is a good catch. So, I think in the bug case, the index is not returning the entire array but it is just returning the first value.

@Viz  a couple more interesting nuances.  Sending an arbitrary array reference in and then converting that to a number array using ROW() works:

=LET(Values, SORTBY(Data[S&P BSE Sensex],Data[DATE]),
     sqn,A1:INDEX(A:A,ROWS(Values)),
     BYROW(sqn, LAMBDA(in,
           LET(a,ROW(in),
               IF(a<7,"NA", AVERAGE(INDEX(Values, SEQUENCE(7, 1, a-7+1))))))))

but converting that cell reference outside the LAMBDA doesn't work:

=LET(Values, SORTBY(Data[S&P BSE Sensex],Data[DATE]),
     sqn,A1:INDEX(A:A,ROWS(Values)),
     BYROW(ROW(sqn), LAMBDA(a,
           IF(a<7,"NA", AVERAGE(INDEX(Values, SEQUENCE(7, 1, a-7+1)))))))

 

@Viz 

BYROW() assumes you transfer an array as parameter, even if that array has only one element. SEQUENCE() from array works differently, you may compare =SEQUENCE(2) and =SEQUENCE({2}).

 

To fix, you may pass first element of each row array

=LET(
    Values, SORTBY(Data[S&P BSE Sensex], Data[DATE]),
    Sqn, SEQUENCE(ROWS(Values)),
    BYROW(
        Sqn,
        LAMBDA(a,
            IF(
                @a < 7,
                "NA",
                AVERAGE(
                    INDEX(Values, SEQUENCE(7, 1, @a - 7 + 1))
                )
            )
        )
    )
)

 or simply change BYROW on MAP

=LET(
    Values, SORTBY(Data[S&P BSE Sensex], Data[DATE]),
    Sqn, SEQUENCE(ROWS(Values)),
    MAP(
        Sqn,
        LAMBDA(a,
            IF(
                a < 7,
                "NA",
                AVERAGE(
                    CHOOSEROWS(
                        Values,
                        SEQUENCE(7, 1, +a - 7 + 1)
                    )
                )
            )
        )
    )
)

which passes to lambda value, not array.

as always thank Sergei, I seem to always learn something new from you. That said, can you explain:
a) why does it work with a range reference instead of an array? When it's a range, why is the "array of one" treated differently?
As for what is actually happening here is a sample to help:
=AVERAGE(SEQUENCE(5,1,2+{5})) vs =AVERAGE(SEQUENCE(5,1,2+5))
So =AVERAGE(SEQUENCE(5,1,2+5)) eval to =AVERAGE({7;8;9;10;11}) and then =9
BUT =AVERAGE(SEQUENCE(5,1,2+{5})) eval to =7
obviously the {5} is messing it up even though if you use F9 on the SEQUENCE(5,1,2+{5}) you still get {7;8;9;10;11} of which the AVERAGE() should be 9 but my guess is that internally it is treating it as {7};{8};{9};{10};{11} and then AVERAGE(7) ; AVERAGE(8) ; ... and then returning only the first of those results being AVERAGE(7) = 7, but that is more guess than anything

@Viz 

@Sergei Baklan seems to have got to the bottom of the problem.  Yet another example of the failure of the Calc engine to address the problem of arrays of arrays (that is, 'normal' problems).  

 

I do not have much extra to offer; I was interested in trying to master the new AFE.  I decided that the problem is better suited to MAP, so bypassed the problem rather than addressing it.  Rather than showing the anonymous Lambda function on the worksheet, I defined a Named Lambda 'AverageLast7λ' in its Curried form, giving the worksheet formula

 

=LET(
    Values,  TAKE(SORT(Data), , -1),
    Sqn,     SEQUENCE(ROWS(Values)),
    MAP(Sqn, AverageLast7λ(Values))
);

AverageLast7λ
= LAMBDA(v, LAMBDA(a,
    IF(a<7,"NA", 
        AVERAGE(INDEX(v, SEQUENCE(7, 1, a,-1)))
    )
  ));

The SORT and AVERAGE formulas are different, but it is simply change for change's sake.

AverageLast7λ
= LAMBDA(v, LAMBDA(a,
    IF(a<7,"NA", 
        AVERAGE(TAKE(TAKE(v, a),-7))
    )
  ));

also worked.

 


@mtarler wrote:
 When it's a range, why is the "array of one" treated differently?

A range pointing to a single cell isn't an array, it's a scalar. Try TYPE(A1:A1) & TYPE(A1:A2) & TYPE({1}) & TYPE({1,2}). I wish single element arrays were implicitly converted to scalars.

 


@mtarler wrote:
As for what is actually happening here is a sample to help:
=AVERAGE(SEQUENCE(5,1,2+{5})) vs =AVERAGE(SEQUENCE(5,1,2+5))

My observed behaviour (not sure if it's documented or a bug), is when a function that expects scalar arguments receives an array, it outputs an array with the same shape applying the function.

I.e., SEQUENCE({3}) is the same as MAP({3}, LAMBDA(x, SEQUENCE(x)))

The problem is, when the function returns an array itself like SEQUENCE, you're creating an array of arrays which can't be displayed in the grid. In the former case it just uses the first element in the inner array. In the later, MAP, case it gives #CALC! "Nested arrays aren't supported".

 

Related question here too TEXTSPLIT combined with BYROW returns an unexpected result when using - Microsoft Community

@mtarler 

That's better to ask @JoeMcDaid  for more professional explanation.

My guess if we have BYROW on spill, aka array, it takes each element of such array which is value. Each element of the range is an array, auto-lifting doesn't work in such case.

As for the SEQUENCE not sure how it works. With an array in step it looks like it returns only the value for the first step. 

@Peter Bartholomew , hope you are happy with new AFE. Had no time to dig it in details, first impression is quite positive.

Thanks for the detailed explanation and the solution, Sergei.

 

I am still trying to process your explanation fully. But after reading your explanation to @mtarler and his work around with row function, I think I am getting some sense of the problem.

 

I am wondering if ByRow is treating the sequence as matrix instead of a vector when we directly pass it in the function (and therefore the array or array problem).

 

Am I going in the right direction?

Thanks, Peter. That definitely adds a degree of elegance to the formula.

I have been relying too much on Index for looping. I indeed need to do a bit of unlearning and should start working with Map and Take.

@Viz , not sure I understood correctly. IMHO, BYROW() never passes the matrix since that's by row. From the grid it takes an array (vector) since the row in the grid is always an array. From the array it takes current value of the array, it could be value or vector. That's our job to transform such vector/value into single value with lambda, otherwise we have array of arrays. 

@Viz 

I followed @JamesDuley 's link and found a related question from a personal friend of some 20 years standing.  In answering his question, I advanced my own understanding of the problem

TEXTSPLIT combined with BYROW returns an unexpected result when using - Microsoft Community

 

My conclusions were that BYROW is designed to accept a 2D matrix and pass it to the calculation, row at a time.  In the special case of the matrix having only a single column, BYROW will pass single values to the calculation, but they will still be

'Type = 64' and will get caught by any (monumentally undesirable) legacy nested array checks.  Passing the single value (1x1) array can be made to work if you always reference its value using '@', in order to convert it to a scalar value.

 

Where BYROW references a Range, things proceed somewhat differently.  In normal use BYROW passes row ranges to the calculation engine (COUNTIFS will work with such an object as its criterion range for example) but, in the special case, the row might reduce to a single cell reference. 

= TYPE(cell)

will then return a 1 or 2, so the calculation proceeds differently depending on whether the initial BYROW reference is an array or a range object.

 

@Peter Bartholomew Oh, I guess I've seen related post from James here on MTC

 


@Peter Bartholomew wrote:

Where BYROW references a Range, things proceed somewhat differently.  In normal use BYROW passes row ranges to the calculation engine (COUNTIFS will work with such an object as its criterion range for example) but, in the special case, the row might reduce to a single cell reference. 

= TYPE(cell)

will then return a 1 or 2, so the calculation proceeds differently depending on whether the initial BYROW reference is an array or a range object.


 That's an interesting observation. It looks like BYROW internally does INDEX(array, row_num, 0) for each row to pass to the function. INDEX, like this, returns a scalar if, and only if, array is a single column and it's a reference. This is different from the almost equivalent CHOOSEROWS(array, row_num) which always returns an array.

As for you, it is early days. I do not think I have fully mastered the options available on the Names tab. I have been able to do a bit of editing though.
Modules, I am pretty comfortable with.
Grid is amazing. Hopefully, I will soon be able to switch the formula bar off forever!

@Peter Bartholomew 

I have a similar question. Suppose I have an array (A1#) such that: 

2
3

I need to create a sequence that in the end looks something like this: 

2
2
3
3
3
4
4
4
4

 

My thought was the use BYROW(A1#,LAMBDA(x,SEQUENCE(1,4,x,0))) so that I get: 

22N/AN/A
333N/A
4444

 

And then feed that into TOCOL() and use a FILTER() to filter out the "N/A"s. But the function throws a CALC error. 

 

Any thoughts? 

a) this should really be a new thread (and welcome to add link back to this discussion)
b) if you hover over the error you are probably getting the array of arrays error
c) a common work around here is to use REDUCE() and 'build' the array inside and a drop to get rid of the initial value. e.g. (note this is untested and just off the cuff as an example)
=DROP( REDUCE(0, A1#, LAMBDA( p, q, HSTACK( p, SEQUENCE( q, 1, q, 0) ) ) ), 1)