Forum Discussion

TheDub's avatar
TheDub
Iron Contributor
Mar 11, 2022
Solved

Can SCAN() and BYROW() be combined?

The support page for SCAN() contains the following example: 

                                      =SCAN("",A1:C2,LAMBDA(a,b,a&b))

 

My general issue is combining BYROW() or BYCOLUMN() with other functions, but I'll illustrate it with this SCAN() example:

 

How does one SCAN() row by row such that the output it something like:

a, ab, abc

d, de, def

 

=BYROW(A1:C2,LAMBDA(row,SCAN("",row,LAMBDA(a,b,a&b))))

 

doesn't work...

  • TheDub 

    It looks like it should work and, indeed, a similar formula with REDUCE does work.  Unfortunately SCAN produces an array on each row, leading to an array of arrays which hasn't been implemented within the calc engine.  There are two distinct approaches to overcome the problem, neither of which is that attractive. 

    The first is to allow SCAN to run through the 6 values as a single sequence but to reset the resulting text to null whenever the scan returns to the first column.  Rather than scanning the text array, I scanned the sequence {0,1,2;3,4,5} which can both be used to look up the text to concatenate and to identify the leading column.

    = SCAN("", SEQUENCE(2,3,0),
       LAMBDA(str,k,
          LET(
             row, 1+QUOTIENT(k,3),
             column, 1+MOD(k,3),
             chr, INDEX(data, row, column),
             IF(column>1,str,"")&chr
          )
       )
    )

    The second approach follows your original formula more closely but, instead of allowing SCAN to build a text array, it builds a Thunk that contains the array.  BYROW then creates an array of 2 thunks and if the result is passed to MAKEARRAY each thunk may be singled out by INDEX, then expanded by passing it a null parameter string and wrapped within a further INDEX, which will return a single value to be assembled into the solution array.

    Thunkλ
    = LAMBDA(x,LAMBDA(x))
    
    Forming a thunk:
    = LET(
        arrayϑ, Thunkλ(array),
    
    Returning its content:
    = arrayϑ()

    I will leave it to you to judge whether the most appropriate answer to your question is 'yes' or 'no'!

     

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    TheDub 

    My approach with SCAN:

    ScanByrow
    =LAMBDA(matrix,LET(
        counter, SEQUENCE(ROWS(matrix)),
        data, HSTACK(counter, matrix),
        AccumulateText, LAMBDA(a, v, IF(ISNUMBER(v), "", a & v)),
        Texts, SCAN("", data, AccumulateText),
        DROP(Texts, , 1)
    ))
  • Levi Bailey's avatar
    Levi Bailey
    Copper Contributor

    TheDub PeterBartholomew1 

    Here's another abomination of an approach for combining SCAN() and BYROW():

    =TEXTSPLIT(TEXTJOIN("~",TRUE,BYROW(A1:C2, LAMBDA(array,TEXTJOIN("|",TRUE,SCAN("",array,LAMBDA(a,b,a&b)))))),"|","~")

    It gets around the type incompatibility by joining arrays into strings, processing them, then splitting the string back into an array.  This example uses "|" as a column separator and "~" as a row separator.

     

    I was only doing this as a means to an end, and did not optimise much - hopefully someone finds a way to make it such that there is only one TEXTJOIN.

    A drawback of this approach is that everything gets compressed into a single string. This makes it hard to follow, and I have not checked but there is likely to be a limit to the length of the string that can be processed properly.

     

    Like all the other approaches that I am aware of, it is unwieldy and I would prefer not to burden others with spreadsheets that contain this.

     

    Has anyone found any better approaches since this thread was active a couple of years ago?

    • karimbaez's avatar
      karimbaez
      Copper Contributor

      Levi BaileyI found a way using MakeArray

       

       

      =LET(data,
        SEQUENCE(2,3),
        MAKEARRAY(ROWS(data),COLUMNS(data),
          LAMBDA(row,col,
            CONCAT(
              INDEX(data,row,SEQUENCE(col))
            )
          )
        )
      )

       

      Found it on this YouTube video

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      Levi Bailey 

      Hi Levi.

      I wouldn't say that much has changed.  It really is up to Microsoft to sort the mess that is of their making.  I think the problem of helper functions not returning arrays of arrays is getting more important day by day.  If one measures the need by the proportion of problems for which the array of arrays is a central feature of the problem which, as the scope of dynamic array formulas increases, may occur several times within a formula and resolving the issue becomes more important.  Mind you, I would not expect ideas such as Craig Hatmaker 's 5g functions to fall within the original specification.  Most spreadsheet users still are attached to horrible 'and send down' relative references, so it is only a minority that require the DA changes to be brought to fruition.

       

      I had made some progress towards overcoming the problem by using Thunks to hold the inner arrays, and having other user functions specifically written to expand the arrays of thunks to display the content.  

      A generalised Lambda helper function that return arrays of arrays using bisection. - Microsoft Community Hub

      I still have some ideas to follow up, for example, expanding and stacking thunks 16 at a time, so 5 REDUCE steps would process an entire Excel column.  The challenge is to both achieve acceptable efficiency and to hide the unsellable contortions from the end user!

       

  • karimbaez's avatar
    karimbaez
    Copper Contributor

    HiTheDub try this method that I found on this youtube video https://www.youtube.com/watch?v=ITg5lfm3d14. it uses the MAKEARRAY function.

     

    Replace this with your array "CHAR(RANDARRAY(10,10,65,90,TRUE))"

     

     

    =LET(
      x,CHAR(RANDARRAY(10,10,65,90,TRUE)),
      MAKEARRAY(ROWS(x),COLUMNS(x),
        LAMBDA(r,c,CONCAT(INDEX(x,r,SEQUENCE(c))))))

     

     

     

     

  • TheDub 

    I created another approach to the problem in which many rows are stepped through the SCAN as a block (by again using thunks) and then displayed using MAKEARRAY.

     

    = ExpandVλ(SCAN(,testdataVϑ,ConcatThunkλ))
    
    Where
    ConcatThunkλ = LAMBDA(aϑ,bϑ,Thunkλ(aϑ()&bϑ()));
    
    testdataVϑ = BYCOL(testdata, Thunkλ);
    
    ExpandVλ = LAMBDA(dϑ,
        MAKEARRAY(n,10, LAMBDA(r,c, INDEX(INDEX(dϑ,1,c)(),r,1)))
    );

     

    I then performed some timing runs using 10 columns but a varying number of rows.

    What is troubling is that both this solution and the earlier one are of Order 1.7 (I am more used to seeing integer values).  The earlier REDUCE formula appears to be more efficient than that using MAKEARRAY but, either way, the growth means that the approaches are not usable once the row count goes into the 1000s (in which case the time for a recalculation exceeds 1s).  

     

    My conclusion is that the array of array problem needs to the addressed by Microsoft.  Otherwise we are likely to find ourselves in a situation of falling back on copy fill which will be a real joy when resizing calculations that are running into the 100,000s of rows!

     

     

      • lori_m's avatar
        lori_m
        Steel Contributor

        PeterBartholomew1 

        Lol... until Microsoft comes up with a better alternative, I'd suggest using the SergeiBaklan  formulation or something like below on longer arrays:

        =REDUCE(INDEX(data,,1),
             SEQUENCE(COLUMNS(data)-1),
             LAMBDA(acc,i,
                 HSTACK(acc,
                     INDEX(acc,,i)&INDEX(data,,i+1)
                     )))

        Which performs better would depend on array dimensions, I'd think the method that stacks along the smaller of the row or column dimensions would be preferable. The othe advantage of thunks and the above method is that vector accumulation is supported.

         

        BTW, previous timing results that I posted would need to be updated by removing the 'vstack' name - turns out that prototype function was slowing things down a lot.

  • TheDub 

    AFE is not working in my Excel and I just use name manager. 

    byR = LAMBDA(L,MAKEARRAY(ROWS(L),COLUMNS(L),LAMBDA(r,c,CONCAT(INDIRECT(ADDRESS(r,1)&":"&ADDRESS(r,c))))))

    The formula above will only work if the list is started from cell A1.

    To get the dynamic result, I've added another helper name that represents the first cell in the list.

     

     

    • SergeiBaklan's avatar
      SergeiBaklan
      MVP

      Starrysky1988 

      I'd suggest to modify a bit which allows to work both with ranges and arrays.

       

      ByRR = LAMBDA(A,
          MAKEARRAY(
              rows(A),
              columns(A),
              LAMBDA( n, m,
                  CONCAT( INDEX( A, n, SEQUENCE(m) ) )
          )
          )
      );

       

       

      Performance is bit better than with thunks or makearray() which uses scan(). 

      • Starrysky1988's avatar
        Starrysky1988
        Iron Contributor

        SergeiBaklan 

        Dear Mr. Sergei Baklan,

        Thanks for your highlight and I've learnt something from it. The formula now is perfect.

        What I like the most using Excel is the art of combining of different Excel Functions.

        If we are using old version of Excel 2016 onwards, we can just simply use "Concat" function as below and copy and paste to the cell we want. 

        Sometimes, we make things easy, but sometimes we make them difficult.

        May be we want to save a few second of extra work and always want dynamic result.

         

         

  • lori_m's avatar
    lori_m
    Steel Contributor

    Following the announcement of the new array functions, I wondered if VSTACK or some other function could be put to use in this type of situation to extend BYROW?

     

    In the absence of a built-in function, maybe define instead:

    BYROWS
    =LAMBDA(array,function,
         REDUCE(1,
             SEQUENCE(ROWS(array)),
             LAMBDA(A,i,
                  IF(SEQUENCE(i)=i,
                      function(INDEX(array,i,)),
                      A))))

    which could be used in conjunction with SCAN as shown... 

    (Out of interest I did rough timings of the various implementations on a 100x100 array)

     

     

     

    • lori_m 

      My first attempt

      reduceV=
      LAMBDA(array,
          DROP(
              REDUCE(
                  "",
                  SEQUENCE(ROWS(array)),
                  LAMBDA(a, n,
                      VSTACK(
                          a,
                          SCAN(, CHOOSEROWS(array, n),
                              LAMBDA(acc, v, acc & v))
                      )
                  )
              ),
              1
          )
      );

      which gives

      Not sure how to integrate it with timer.

      • lori_m's avatar
        lori_m
        Steel Contributor

        SergeiBaklan 

        Great - looks better. and likely faster too.  For timing maybe replace "" by 0 (or 1) and '&' with '+' then update function name in timer formula. From the times, it looks like your system is running twice as fast! I needed to reduce to n=50 to make it responsive enough on my setup.

  • TheDub 

    OK.  So I am beginning to get fed up with the arrays of arrays problem with the 365 lambda helper functions.  Isn't that what one uses spreadsheets for?

    Re: merging rows - Microsoft Tech Community

    Every table is a list of records.  Crosstabs are arrays of arrays.  I would say that up to 80% of my workbooks would benefit from being able to handle arrays of arrays efficiently.  

     

    OK, thunks do offer a way around the problem and I am getting better at using them.  The attached workbook has Lambda functions Thunkλ and ExpandThunkλ that I have used to form and expand the arrays of thunks, but it is still a pain and I struggle to see how I would convince a normal Excel user that this is better than 'and copy down (ugh!)'.

  • lori_m's avatar
    lori_m
    Steel Contributor

    TheDub 

    I had also tried to raise this point several times prior to release.  My hope was that something like this would work here,

    =SCAN({"";""},data,LAMBDA(a,b,a&b)))

    which would be a more efficient implementation of:

    =REDUCE("",
         SEQUENCE(COLUMNS(data)),
            LAMBDA(acc,i,
               IF(SEQUENCE(,i)=i,
                  INDEX(acc,,i-1)&INDEX(data,,i),
                  acc)))

    Instead a reusable 'SCANBYROW' function can be implemented via one of the methods Peter suggested, or formula above but this is far from ideal.

    • TheDub's avatar
      TheDub
      Iron Contributor

      As the kids nowadays say, SMH... Functions like that REDUCE() are some kind of abomination: six sub-functions (two of which are used twice), eight sets of "( )"s and an endless stream of ",". After a few of those, one's brain just shuts down 🙂

      • lori_m's avatar
        lori_m
        Steel Contributor
        Fair enough, I think many other Excellers out there would agree with such sentiments! As per Peter's earlier comments, formula construction is starting to become more the domain of the developer community - similar examples of functional code abound on stackoverflow.

        The post was more to illustrate how an array of results could be generated in 2d rather than an array of 1d-arrays which is not allowed. If similar updates were made to SCAN/BYROW/etc functions one wouldn't need to resort to such headache-inducing formulas (though I'm sure the C++ source code of those functions would be many times worse!)

  • TheDub 

    It looks like it should work and, indeed, a similar formula with REDUCE does work.  Unfortunately SCAN produces an array on each row, leading to an array of arrays which hasn't been implemented within the calc engine.  There are two distinct approaches to overcome the problem, neither of which is that attractive. 

    The first is to allow SCAN to run through the 6 values as a single sequence but to reset the resulting text to null whenever the scan returns to the first column.  Rather than scanning the text array, I scanned the sequence {0,1,2;3,4,5} which can both be used to look up the text to concatenate and to identify the leading column.

    = SCAN("", SEQUENCE(2,3,0),
       LAMBDA(str,k,
          LET(
             row, 1+QUOTIENT(k,3),
             column, 1+MOD(k,3),
             chr, INDEX(data, row, column),
             IF(column>1,str,"")&chr
          )
       )
    )

    The second approach follows your original formula more closely but, instead of allowing SCAN to build a text array, it builds a Thunk that contains the array.  BYROW then creates an array of 2 thunks and if the result is passed to MAKEARRAY each thunk may be singled out by INDEX, then expanded by passing it a null parameter string and wrapped within a further INDEX, which will return a single value to be assembled into the solution array.

    Thunkλ
    = LAMBDA(x,LAMBDA(x))
    
    Forming a thunk:
    = LET(
        arrayϑ, Thunkλ(array),
    
    Returning its content:
    = arrayϑ()

    I will leave it to you to judge whether the most appropriate answer to your question is 'yes' or 'no'!

     

    • TheDub's avatar
      TheDub
      Iron Contributor
      First - and obviously - thanks for you thoughtful reply. I'll play with both options against other examples, but my gut feeling is that #2 will come up ahead.

      Second - and this is sort of a general rant - having played around with many of the new shiny toys recently introduced to Excel, I must say the whole process feels strange, like "two steps forward - one step back". This one is a great example of Excel being inconsistent in applying its own logic: if something can be done by row (or by column) it should be always doable that way, instead of letting users find out (and memorizing) the circumstances where the logic works and where it doesn't. When you say, about the two approaches (in this case) to overcoming the problem, that "neither of which is that attractive" you are being, I feel, very polite. The near constant need to look for "alternatives", "workarounds", "hacks", etc. to resolve issues which should be straightforward is utterly exhausting....
      • PeterBartholomew1's avatar
        PeterBartholomew1
        Silver Contributor

        TheDub 

        I see your point.  The best I can say is that the pitfalls are relatively few. 

        A bit like traditional Excel where one tries to explain why

        = COUNTIFS(MONTH(date),6))

        doesn't work.  Because CONTIFS doesn't accept array parameters in place of a range reference always sounded a bit lame.

         

        In the present case, I got to

        = MAKEARRAY(2,3, Expandλ(
              BYROW(data, LAMBDA(row,Thunkλ(SCAN("", row, Concatλ))))
           ))
        
        where
        Expandλ
        = LAMBDA(ϑ, 
            LAMBDA(r,c, 
                INDEX(INDEX(ϑ, r,1)(),,c)
            ));
        
        Concatλ
        = LAMBDA(array,cell, array&cell);

        It is heavy going but, once done, it can be reused.  Hopefully, not too often!

Resources