Can SCAN() and BYROW() be combined?

The https://support.microsoft.com/en-us/office/scan-function-d58dfd11-9969-4439-b2dc-e7062724de29 =SCAN("",A1:C2,LAMBDA(a,b,a&b)) My general issue is...

Formulas and Functions

PeterBartholomew1
Mar 11, 2022
TheDub
It looks like it should work and, indeed, a similar formula with REDUCE does work. Unfortunately SCAN produces an array on each row, leading to an array of arrays which hasn't been implemented within the calc engine. There are two distinct approaches to overcome the problem, neither of which is that attractive.
The first is to allow SCAN to run through the 6 values as a single sequence but to reset the resulting text to null whenever the scan returns to the first column. Rather than scanning the text array, I scanned the sequence {0,1,2;3,4,5} which can both be used to look up the text to concatenate and to identify the leading column.
= SCAN("", SEQUENCE(2,3,0), LAMBDA(str,k, LET( row, 1+QUOTIENT(k,3), column, 1+MOD(k,3), chr, INDEX(data, row, column), IF(column>1,str,"")&chr ) ) )
The second approach follows your original formula more closely but, instead of allowing SCAN to build a text array, it builds a Thunk that contains the array. BYROW then creates an array of 2 thunks and if the result is passed to MAKEARRAY each thunk may be singled out by INDEX, then expanded by passing it a null parameter string and wrapped within a further INDEX, which will return a single value to be assembled into the solution array.
Thunkλ = LAMBDA(x,LAMBDA(x)) Forming a thunk: = LET( arrayϑ, Thunkλ(array), Returning its content: = arrayϑ()
I will leave it to you to judge whether the most appropriate answer to your question is 'yes' or 'no'!

PeterBartholomew1

Silver Contributor

Mar 28, 2022

The tags to the previous post were the creation of the cat!

scan by row.xlsx35 KB

lori_m

Steel Contributor

Mar 28, 2022

PeterBartholomew1

Lol... until Microsoft comes up with a better alternative, I'd suggest using the SergeiBaklan formulation or something like below on longer arrays:

=REDUCE(INDEX(data,,1),
     SEQUENCE(COLUMNS(data)-1),
     LAMBDA(acc,i,
         HSTACK(acc,
             INDEX(acc,,i)&INDEX(data,,i+1)
             )))

Which performs better would depend on array dimensions, I'd think the method that stacks along the smaller of the row or column dimensions would be preferable. The othe advantage of thunks and the above method is that vector accumulation is supported.

BTW, previous timing results that I posted would need to be updated by removing the 'vstack' name - turns out that prototype function was slowing things down a lot.

PeterBartholomew1
Silver Contributor
Mar 30, 2022
lori_m
Your recommendation was spot on. The outcome seems to be that the formula now available will have one dimension that grows quadratically whilst the other may be linear. The formula you suggested and attributed to SergeiBaklan grows linearly as the number of rows increases and beats the formulae I was playing with hands down; it should get to a million rows with calculation times under 1s!
Similar behavior is achieved by the variation
= REDUCE(INDEX(data,,1), BYCOL(data,Thunkλ), LAMBDA(acc,ϑ, HSTACK(acc,TAKE(acc,,-1)+ϑ())) )
I hadn't spent enough time on the final posts of the first sheet because I was going to use Charles Williams's timing routines and the Microsoft Research timed regression tests made the workbook very slow.

Forum Discussion

Can SCAN() and BYROW() be combined?