SOLVED

Can SCAN() and BYROW() be combined?

Brass Contributor

The support page for SCAN() contains the following example: 

914422ab-f843-49ba-b0de-082385f9cd11

                                      =SCAN("",A1:C2,LAMBDA(a,b,a&b))

 

My general issue is combining BYROW() or BYCOLUMN() with other functions, but I'll illustrate it with this SCAN() example:

 

How does one SCAN() row by row such that the output it something like:

a, ab, abc

d, de, def

 

=BYROW(A1:C2,LAMBDA(row,SCAN("",row,LAMBDA(a,b,a&b))))

 

doesn't work...

44 Replies

@Sergei Baklan 

Great - looks better. and likely faster too.  For timing maybe replace "" by 0 (or 1) and '&' with '+' then update function name in timer formula. From the times, it looks like your system is running twice as fast! I needed to reduce to n=50 to make it responsive enough on my setup.

@lori_m 

I see, thank you. Here it is

image.png

But that's the best result. Average one is about

image.png

@Sergei Baklan 

Yup looks pretty fast! In any case, native byrow/bycol extended functions would get my vote to bypass the nested array error in general, as suggested in the original post. Then other methods - including thunks - could benefit from being able to process arrays of results more efficiently.

@lori_m 

I could agree, but everything depends on goals and concrete scenario.

@Sergei Baklan 

I have seen the posts but have yet to work through them in detail.  Meanwhile I have tried VSTACK as an alternative to MAKEARRAY, but still using Thunks.  The worksheet formula relies on a number of other named Lambda functions.

= Expandλ(BYROW(data,Scanλ))

Scanλ
= LAMBDA(row,Thunkλ(SCAN("",row,Concatλ)));

Concatλ 
= LAMBDA(a,b,a&b);

Expandλ 
= LAMBDA(dϑ,
    REDUCE(,dϑ,LAMBDA(accϑ,ϑ,Thunkλ(VSTACK(accϑ(),ϑ()))))()
  );

@lori_m 

@TheDub was pretty irritated by the message "Nested arrays are not supported", and I believe rightly so.  Most Excel errors report an error on the part of the user/developer.  This one is different.  The correct answer to a wide class of problems to which one might with to apply dynamic array methods is the array of arrays.  This error is nothing to do with the user, it is simply flagging up a deficiency in the code; one which may not have mattered in the past but which is now centre stage.

 

As a minimum, BYROW, BYCOL and MAKEARRAY should address this limitation and convert the 'column of row arrays' to the equivalent 2D array without involving the user in workarounds.  

@Peter Bartholomew 

Agreed, the previous post was intended to be fully in support of the OP and to reiterate that the proposed BYROW formula should really be made to work in a future update if possible. 

 

I'd really prefer that nested arrays were converted to 2d arrays natively. Google sheets does this eg:
={{1,2};{3,4}} -> {1,2;3,4}
={{1,2},{3,4}} -> {1,2,3,4}
so no need for separate VSTACK / HSTACK functions. I gather from the response you received from the development team that backward compatibility makes such an update more challenging however.

@Peter Bartholomew , I had no doubt you will use thunks. Did you try to integrate with @lori_m timer?

@lori_m

 Afraid implementation of array of arrays requires re-building of calc engine architecture. Other words create new Excel, if not from scratch but close to that. Perhaps I'm wrong, just my impression.

 

Other way is to built in workarounds like we use now in built-in functions, but that will be quite narrow solution which works for concrete cases only.

 

Anyway, I believe this or that solution will be found and we'll see array of arrays one day.  Such functionality is quite demanded for new Excel.

@Sergei Baklan 

No, I need to find some time to work through Lori's solution.  I did some timing runs with my solution using 100x100 and 200x50 grids of single characters using Charles Williams's profiler.  The latter was worse for a range recalc. with timings of 350ms rather than 250ms.

@Sergei Baklan 

The timer code is courtesy of the MS Research team and was used for prototyping some of the new functions. Without access to the new functions I substituted the vstack prototype function from that link and got the results shown in attachment which are a little disappointing and I'm sure thunk results could be improved significantly though unlikely ever to match the faster alternatives listed.

 

I've no idea how much effort would be required to address the nested arrays error but community feedback like this can only help bump it up the priority list!

@lori_m 

Oops, I didn't open the link within your file. Yes, looks like unchanged samples from the prototype.

 

Prioritization is never simple task. I guess Microsoft decides such compromises every few months (have no idea about their planning process). If to simplify, like make Evaluate Formula box resizable or introduce few other lambda functions.

 

My another guess on first place is what demanded by corporate business which gives 85% (third guess) of income.

 

Anyway, sooner or later nested arrays shall appear.

@TheDub 

AFE is not working in my Excel and I just use name manager. 

byR = LAMBDA(L,MAKEARRAY(ROWS(L),COLUMNS(L),LAMBDA(r,c,CONCAT(INDIRECT(ADDRESS(r,1)&":"&ADDRESS(r,c))))))

Starrysky1988_0-1648376196282.png

The formula above will only work if the list is started from cell A1.

To get the dynamic result, I've added another helper name that represents the first cell in the list.

Starrysky1988_6-1648384617876.png

 

 

@Starrysky1988 

I'd suggest to modify a bit which allows to work both with ranges and arrays.

 

ByRR = LAMBDA(A,
    MAKEARRAY(
        rows(A),
        columns(A),
        LAMBDA( n, m,
            CONCAT( INDEX( A, n, SEQUENCE(m) ) )
    )
    )
);

 

 

Performance is bit better than with thunks or makearray() which uses scan(). 

image.png

@Sergei Baklan 

Dear Mr. Sergei Baklan,

Thanks for your highlight and I've learnt something from it. The formula now is perfect.

What I like the most using Excel is the art of combining of different Excel Functions.

If we are using old version of Excel 2016 onwards, we can just simply use "Concat" function as below and copy and paste to the cell we want. 

Starrysky1988_0-1648396341303.png

Sometimes, we make things easy, but sometimes we make them difficult.

May be we want to save a few second of extra work and always want dynamic result.

 

 

@TheDub 

I created another approach to the problem in which many rows are stepped through the SCAN as a block (by again using thunks) and then displayed using MAKEARRAY.

 

= ExpandVλ(SCAN(,testdataVϑ,ConcatThunkλ))

Where
ConcatThunkλ = LAMBDA(aϑ,bϑ,Thunkλ(aϑ()&bϑ()));

testdataVϑ = BYCOL(testdata, Thunkλ);

ExpandVλ = LAMBDA(dϑ,
    MAKEARRAY(n,10, LAMBDA(r,c, INDEX(INDEX(dϑ,1,c)(),r,1)))
);

 

I then performed some timing runs using 10 columns but a varying number of rows.

image.png

What is troubling is that both this solution and the earlier one are of Order 1.7 (I am more used to seeing integer values).  The earlier REDUCE formula appears to be more efficient than that using MAKEARRAY but, either way, the growth means that the approaches are not usable once the row count goes into the 1000s (in which case the time for a recalculation exceeds 1s).  

 

My conclusion is that the array of array problem needs to the addressed by Microsoft.  Otherwise we are likely to find ourselves in a situation of falling back on copy fill which will be a real joy when resizing calculations that are running into the 100,000s of rows!

 

 

The tags to the previous post were the creation of the cat!

@Peter Bartholomew 

Lol... until Microsoft comes up with a better alternative, I'd suggest using the @Sergei Baklan  formulation or something like below on longer arrays:

=REDUCE(INDEX(data,,1),
     SEQUENCE(COLUMNS(data)-1),
     LAMBDA(acc,i,
         HSTACK(acc,
             INDEX(acc,,i)&INDEX(data,,i+1)
             )))

Which performs better would depend on array dimensions, I'd think the method that stacks along the smaller of the row or column dimensions would be preferable. The othe advantage of thunks and the above method is that vector accumulation is supported.

 

BTW, previous timing results that I posted would need to be updated by removing the 'vstack' name - turns out that prototype function was slowing things down a lot.

@lori_m 

Your recommendation was spot on.  The outcome seems to be that the formula now available will have one dimension that grows quadratically whilst the other may be linear.  The formula you suggested and attributed to @Sergei Baklan grows linearly as the number of rows increases and beats the formulae I was playing with hands down; it should get to a million rows with calculation times under 1s!

Similar behavior is achieved by the variation

= REDUCE(INDEX(data,,1), BYCOL(data,Thunkλ),
      LAMBDA(acc,ϑ, HSTACK(acc,TAKE(acc,,-1)+ϑ()))
  )

I hadn't spent enough time on the final posts of the first sheet because I was going to use Charles Williams's timing routines and the Microsoft Research timed regression tests made the workbook very slow.

 

Hi@TheDub try this method that I found on this youtube video https://www.youtube.com/watch?v=ITg5lfm3d14. it uses the MAKEARRAY function.

 

Replace this with your array "CHAR(RANDARRAY(10,10,65,90,TRUE))"

 

 

=LET(
  x,CHAR(RANDARRAY(10,10,65,90,TRUE)),
  MAKEARRAY(ROWS(x),COLUMNS(x),
    LAMBDA(r,c,CONCAT(INDEX(x,r,SEQUENCE(c))))))