Forum Discussion

Viz's avatar
Viz
Brass Contributor
Nov 18, 2022

ByRow function produces two different results depending on whether row is range or array

In a nutshell, I am trying to compute average with a combination of ByRow, Index, and Average. I needed a sequence number array for this algorithm. When I put this sequence number in a range and use that range in ByRow function, it works great. But when I use that sequence function directly inside a Let(), it gives me wrong answers.

 

I am sharing the codes I have tried, below:

 

This is the first formula I wrote:

 

 

 

 

=LET(
    Values, SORTBY(Data[Sensex], Data[DATE]),
    Sqn, SEQUENCE(ROWS(Values)),
    BYROW(
        Sqn,
        LAMBDA(a,
            IF(
                a < 7,
                "NA",
                AVERAGE(
                    INDEX(Values, SEQUENCE(7, 1, a - 7 + 1))
                )
            )
        )
    )
)​

 

 

 

 

This one produced "a" set of answers. But the answers were slightly incorrect.  To figure out where the problem was, all I did was put the sequences in a separate range and use that range in the ByRow function and this time I got perfect results:

 

So, this is the first part to get the sequence of numbers

 

 

 

In cell, say E4

=LET(
    Values, SORTBY(Data[Sensex], Data[DATE]),
    Sqn, SEQUENCE(ROWS(Values)),
    Sqn
)

 

 

 

 

And this is the second part, where I use the ByRow function and specify the range as an array

 

 

 

 

In cell, say F4

=LET(
    Values, SORTBY(Data[S&P BSE Sensex], Data[DATE]),
    BYROW(
        E4#,
        LAMBDA(a,
            IF(
                a < 7,
                "NA",
                AVERAGE(
                    INDEX(Values, SEQUENCE(7, 1, a - 7 + 1))
                )
            )
        )
    )
)

 

 

 

 

And now I am getting the perfect answer. Is this a bug or am I doing something wrong here:

   PeterBartholomew1 Craig Hatmaker  SergeiBaklan lori_m

 

  • Viz 

    I followed JamesDuley 's link and found a related question from a personal friend of some 20 years standing.  In answering his question, I advanced my own understanding of the problem

    TEXTSPLIT combined with BYROW returns an unexpected result when using - Microsoft Community

     

    My conclusions were that BYROW is designed to accept a 2D matrix and pass it to the calculation, row at a time.  In the special case of the matrix having only a single column, BYROW will pass single values to the calculation, but they will still be

    'Type = 64' and will get caught by any (monumentally undesirable) legacy nested array checks.  Passing the single value (1x1) array can be made to work if you always reference its value using '@', in order to convert it to a scalar value.

     

    Where BYROW references a Range, things proceed somewhat differently.  In normal use BYROW passes row ranges to the calculation engine (COUNTIFS will work with such an object as its criterion range for example) but, in the special case, the row might reduce to a single cell reference. 

    = TYPE(cell)

    will then return a 1 or 2, so the calculation proceeds differently depending on whether the initial BYROW reference is an array or a range object.

     

    • JamesDuley's avatar
      JamesDuley
      Copper Contributor

       


      PeterBartholomew1 wrote:

      Where BYROW references a Range, things proceed somewhat differently.  In normal use BYROW passes row ranges to the calculation engine (COUNTIFS will work with such an object as its criterion range for example) but, in the special case, the row might reduce to a single cell reference. 

      = TYPE(cell)

      will then return a 1 or 2, so the calculation proceeds differently depending on whether the initial BYROW reference is an array or a range object.


       That's an interesting observation. It looks like BYROW internally does INDEX(array, row_num, 0) for each row to pass to the function. INDEX, like this, returns a scalar if, and only if, array is a single column and it's a reference. This is different from the almost equivalent CHOOSEROWS(array, row_num) which always returns an array.

  • Viz 

    SergeiBaklan seems to have got to the bottom of the problem.  Yet another example of the failure of the Calc engine to address the problem of arrays of arrays (that is, 'normal' problems).  

     

    I do not have much extra to offer; I was interested in trying to master the new AFE.  I decided that the problem is better suited to MAP, so bypassed the problem rather than addressing it.  Rather than showing the anonymous Lambda function on the worksheet, I defined a Named Lambda 'AverageLast7λ' in its Curried form, giving the worksheet formula

     

    =LET(
        Values,  TAKE(SORT(Data), , -1),
        Sqn,     SEQUENCE(ROWS(Values)),
        MAP(Sqn, AverageLast7λ(Values))
    );
    
    AverageLast7λ
    = LAMBDA(v, LAMBDA(a,
        IF(a<7,"NA", 
            AVERAGE(INDEX(v, SEQUENCE(7, 1, a,-1)))
        )
      ));

    The SORT and AVERAGE formulas are different, but it is simply change for change's sake.

      • PeterBartholomew1's avatar
        PeterBartholomew1
        Silver Contributor
        As for you, it is early days. I do not think I have fully mastered the options available on the Names tab. I have been able to do a bit of editing though.
        Modules, I am pretty comfortable with.
        Grid is amazing. Hopefully, I will soon be able to switch the formula bar off forever!
    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor
      AverageLast7λ
      = LAMBDA(v, LAMBDA(a,
          IF(a<7,"NA", 
              AVERAGE(TAKE(TAKE(v, a),-7))
          )
        ));

      also worked.

      • Viz's avatar
        Viz
        Brass Contributor
        Thanks, Peter. That definitely adds a degree of elegance to the formula.

        I have been relying too much on Index for looping. I indeed need to do a bit of unlearning and should start working with Map and Take.
  • Viz 

    BYROW() assumes you transfer an array as parameter, even if that array has only one element. SEQUENCE() from array works differently, you may compare =SEQUENCE(2) and =SEQUENCE({2}).

     

    To fix, you may pass first element of each row array

    =LET(
        Values, SORTBY(Data[S&P BSE Sensex], Data[DATE]),
        Sqn, SEQUENCE(ROWS(Values)),
        BYROW(
            Sqn,
            LAMBDA(a,
                IF(
                    @a < 7,
                    "NA",
                    AVERAGE(
                        INDEX(Values, SEQUENCE(7, 1, @a - 7 + 1))
                    )
                )
            )
        )
    )

     or simply change BYROW on MAP

    =LET(
        Values, SORTBY(Data[S&P BSE Sensex], Data[DATE]),
        Sqn, SEQUENCE(ROWS(Values)),
        MAP(
            Sqn,
            LAMBDA(a,
                IF(
                    a < 7,
                    "NA",
                    AVERAGE(
                        CHOOSEROWS(
                            Values,
                            SEQUENCE(7, 1, +a - 7 + 1)
                        )
                    )
                )
            )
        )
    )

    which passes to lambda value, not array.

    • Viz's avatar
      Viz
      Brass Contributor

      Thanks for the detailed explanation and the solution, Sergei.

       

      I am still trying to process your explanation fully. But after reading your explanation to mtarler and his work around with row function, I think I am getting some sense of the problem.

       

      I am wondering if ByRow is treating the sequence as matrix instead of a vector when we directly pass it in the function (and therefore the array or array problem).

       

      Am I going in the right direction?

      • Viz , not sure I understood correctly. IMHO, BYROW() never passes the matrix since that's by row. From the grid it takes an array (vector) since the row in the grid is always an array. From the array it takes current value of the array, it could be value or vector. That's our job to transform such vector/value into single value with lambda, otherwise we have array of arrays. 

    • mtarler's avatar
      mtarler
      Silver Contributor
      as always thank Sergei, I seem to always learn something new from you. That said, can you explain:
      a) why does it work with a range reference instead of an array? When it's a range, why is the "array of one" treated differently?
      As for what is actually happening here is a sample to help:
      =AVERAGE(SEQUENCE(5,1,2+{5})) vs =AVERAGE(SEQUENCE(5,1,2+5))
      So =AVERAGE(SEQUENCE(5,1,2+5)) eval to =AVERAGE({7;8;9;10;11}) and then =9
      BUT =AVERAGE(SEQUENCE(5,1,2+{5})) eval to =7
      obviously the {5} is messing it up even though if you use F9 on the SEQUENCE(5,1,2+{5}) you still get {7;8;9;10;11} of which the AVERAGE() should be 9 but my guess is that internally it is treating it as {7};{8};{9};{10};{11} and then AVERAGE(7) ; AVERAGE(8) ; ... and then returning only the first of those results being AVERAGE(7) = 7, but that is more guess than anything
      • mtarler 

        That's better to ask JoeMcDaid  for more professional explanation.

        My guess if we have BYROW on spill, aka array, it takes each element of such array which is value. Each element of the range is an array, auto-lifting doesn't work in such case.

        As for the SEQUENCE not sure how it works. With an array in step it looks like it returns only the value for the first step. 

  • mtarler's avatar
    mtarler
    Silver Contributor

    I don't have an answer why right now but your 'wrong' answer are just your original values NOT averaged offset by 7. That is probably a good clue what is happening.

    One more clue: This variation also works:

    =LET(Values, SORTBY(Data[S&P BSE Sensex],Data[DATE]),
         Sqn, SEQUENCE(ROWS(Values)),
         Seven, SEQUENCE(7),
         BYROW(Sqn, LAMBDA(a,
             IF(a<7,"NA", AVERAGE(INDEX(Values, a-7+Seven))))))

    Basically setting up the {1,2,3,4,5,6,7} array ahead of time and then simple math inside

    • Viz's avatar
      Viz
      Brass Contributor
      That is a good catch. So, I think in the bug case, the index is not returning the entire array but it is just returning the first value.
      • mtarler's avatar
        mtarler
        Silver Contributor

        Viz  a couple more interesting nuances.  Sending an arbitrary array reference in and then converting that to a number array using ROW() works:

        =LET(Values, SORTBY(Data[S&P BSE Sensex],Data[DATE]),
             sqn,A1:INDEX(A:A,ROWS(Values)),
             BYROW(sqn, LAMBDA(in,
                   LET(a,ROW(in),
                       IF(a<7,"NA", AVERAGE(INDEX(Values, SEQUENCE(7, 1, a-7+1))))))))

        but converting that cell reference outside the LAMBDA doesn't work:

        =LET(Values, SORTBY(Data[S&P BSE Sensex],Data[DATE]),
             sqn,A1:INDEX(A:A,ROWS(Values)),
             BYROW(ROW(sqn), LAMBDA(a,
                   IF(a<7,"NA", AVERAGE(INDEX(Values, SEQUENCE(7, 1, a-7+1)))))))

         

Resources