Forum Discussion
Can SCAN() and BYROW() be combined?
- Mar 11, 2022
It looks like it should work and, indeed, a similar formula with REDUCE does work. Unfortunately SCAN produces an array on each row, leading to an array of arrays which hasn't been implemented within the calc engine. There are two distinct approaches to overcome the problem, neither of which is that attractive.
The first is to allow SCAN to run through the 6 values as a single sequence but to reset the resulting text to null whenever the scan returns to the first column. Rather than scanning the text array, I scanned the sequence {0,1,2;3,4,5} which can both be used to look up the text to concatenate and to identify the leading column.
= SCAN("", SEQUENCE(2,3,0), LAMBDA(str,k, LET( row, 1+QUOTIENT(k,3), column, 1+MOD(k,3), chr, INDEX(data, row, column), IF(column>1,str,"")&chr ) ) )
The second approach follows your original formula more closely but, instead of allowing SCAN to build a text array, it builds a Thunk that contains the array. BYROW then creates an array of 2 thunks and if the result is passed to MAKEARRAY each thunk may be singled out by INDEX, then expanded by passing it a null parameter string and wrapped within a further INDEX, which will return a single value to be assembled into the solution array.
Thunkλ = LAMBDA(x,LAMBDA(x)) Forming a thunk: = LET( arrayϑ, Thunkλ(array), Returning its content: = arrayϑ()
I will leave it to you to judge whether the most appropriate answer to your question is 'yes' or 'no'!
I created another approach to the problem in which many rows are stepped through the SCAN as a block (by again using thunks) and then displayed using MAKEARRAY.
= ExpandVλ(SCAN(,testdataVϑ,ConcatThunkλ))
Where
ConcatThunkλ = LAMBDA(aϑ,bϑ,Thunkλ(aϑ()&bϑ()));
testdataVϑ = BYCOL(testdata, Thunkλ);
ExpandVλ = LAMBDA(dϑ,
MAKEARRAY(n,10, LAMBDA(r,c, INDEX(INDEX(dϑ,1,c)(),r,1)))
);
I then performed some timing runs using 10 columns but a varying number of rows.
What is troubling is that both this solution and the earlier one are of Order 1.7 (I am more used to seeing integer values). The earlier REDUCE formula appears to be more efficient than that using MAKEARRAY but, either way, the growth means that the approaches are not usable once the row count goes into the 1000s (in which case the time for a recalculation exceeds 1s).
My conclusion is that the array of array problem needs to the addressed by Microsoft. Otherwise we are likely to find ourselves in a situation of falling back on copy fill which will be a real joy when resizing calculations that are running into the 100,000s of rows!
- PeterBartholomew1Mar 28, 2022Silver Contributor
The tags to the previous post were the creation of the cat!
- lori_mMar 28, 2022Iron Contributor
Lol... until Microsoft comes up with a better alternative, I'd suggest using the SergeiBaklan formulation or something like below on longer arrays:
=REDUCE(INDEX(data,,1), SEQUENCE(COLUMNS(data)-1), LAMBDA(acc,i, HSTACK(acc, INDEX(acc,,i)&INDEX(data,,i+1) )))
Which performs better would depend on array dimensions, I'd think the method that stacks along the smaller of the row or column dimensions would be preferable. The othe advantage of thunks and the above method is that vector accumulation is supported.
BTW, previous timing results that I posted would need to be updated by removing the 'vstack' name - turns out that prototype function was slowing things down a lot.
- PeterBartholomew1Mar 30, 2022Silver Contributor
Your recommendation was spot on. The outcome seems to be that the formula now available will have one dimension that grows quadratically whilst the other may be linear. The formula you suggested and attributed to SergeiBaklan grows linearly as the number of rows increases and beats the formulae I was playing with hands down; it should get to a million rows with calculation times under 1s!
Similar behavior is achieved by the variation
= REDUCE(INDEX(data,,1), BYCOL(data,Thunkλ), LAMBDA(acc,ϑ, HSTACK(acc,TAKE(acc,,-1)+ϑ())) )
I hadn't spent enough time on the final posts of the first sheet because I was going to use Charles Williams's timing routines and the Microsoft Research timed regression tests made the workbook very slow.