Forum Discussion
TheDub
Mar 11, 2022Iron Contributor
Can SCAN() and BYROW() be combined?
The support page for SCAN() contains the following example:
=SCAN("",A1:C2,LAMBDA(a,b,a&b))
My general issue is combining BYROW() or BYCOLUMN() with other functions, but I'll illustrate it with this SCAN() example:
How does one SCAN() row by row such that the output it something like:
a, ab, abc
d, de, def
=BYROW(A1:C2,LAMBDA(row,SCAN("",row,LAMBDA(a,b,a&b))))
doesn't work...
It looks like it should work and, indeed, a similar formula with REDUCE does work. Unfortunately SCAN produces an array on each row, leading to an array of arrays which hasn't been implemented within the calc engine. There are two distinct approaches to overcome the problem, neither of which is that attractive.
The first is to allow SCAN to run through the 6 values as a single sequence but to reset the resulting text to null whenever the scan returns to the first column. Rather than scanning the text array, I scanned the sequence {0,1,2;3,4,5} which can both be used to look up the text to concatenate and to identify the leading column.
= SCAN("", SEQUENCE(2,3,0), LAMBDA(str,k, LET( row, 1+QUOTIENT(k,3), column, 1+MOD(k,3), chr, INDEX(data, row, column), IF(column>1,str,"")&chr ) ) )
The second approach follows your original formula more closely but, instead of allowing SCAN to build a text array, it builds a Thunk that contains the array. BYROW then creates an array of 2 thunks and if the result is passed to MAKEARRAY each thunk may be singled out by INDEX, then expanded by passing it a null parameter string and wrapped within a further INDEX, which will return a single value to be assembled into the solution array.
Thunkλ = LAMBDA(x,LAMBDA(x)) Forming a thunk: = LET( arrayϑ, Thunkλ(array), Returning its content: = arrayϑ()
I will leave it to you to judge whether the most appropriate answer to your question is 'yes' or 'no'!
- Patrick2788Silver Contributor
My approach with SCAN:
ScanByrow =LAMBDA(matrix,LET( counter, SEQUENCE(ROWS(matrix)), data, HSTACK(counter, matrix), AccumulateText, LAMBDA(a, v, IF(ISNUMBER(v), "", a & v)), Texts, SCAN("", data, AccumulateText), DROP(Texts, , 1) ))
- Levi BaileyCopper Contributor
Here's another abomination of an approach for combining SCAN() and BYROW():
=TEXTSPLIT(TEXTJOIN("~",TRUE,BYROW(A1:C2, LAMBDA(array,TEXTJOIN("|",TRUE,SCAN("",array,LAMBDA(a,b,a&b)))))),"|","~")
It gets around the type incompatibility by joining arrays into strings, processing them, then splitting the string back into an array. This example uses "|" as a column separator and "~" as a row separator.
I was only doing this as a means to an end, and did not optimise much - hopefully someone finds a way to make it such that there is only one TEXTJOIN.
A drawback of this approach is that everything gets compressed into a single string. This makes it hard to follow, and I have not checked but there is likely to be a limit to the length of the string that can be processed properly.Like all the other approaches that I am aware of, it is unwieldy and I would prefer not to burden others with spreadsheets that contain this.
Has anyone found any better approaches since this thread was active a couple of years ago?
- karimbaezCopper Contributor
Levi BaileyI found a way using MakeArray
=LET(data, SEQUENCE(2,3), MAKEARRAY(ROWS(data),COLUMNS(data), LAMBDA(row,col, CONCAT( INDEX(data,row,SEQUENCE(col)) ) ) ) )
Found it on this YouTube video
As discussed above that's the slowest variant from all possible ones.
- PeterBartholomew1Silver Contributor
Hi Levi.
I wouldn't say that much has changed. It really is up to Microsoft to sort the mess that is of their making. I think the problem of helper functions not returning arrays of arrays is getting more important day by day. If one measures the need by the proportion of problems for which the array of arrays is a central feature of the problem which, as the scope of dynamic array formulas increases, may occur several times within a formula and resolving the issue becomes more important. Mind you, I would not expect ideas such as Craig Hatmaker 's 5g functions to fall within the original specification. Most spreadsheet users still are attached to horrible 'and send down' relative references, so it is only a minority that require the DA changes to be brought to fruition.
I had made some progress towards overcoming the problem by using Thunks to hold the inner arrays, and having other user functions specifically written to expand the arrays of thunks to display the content.
I still have some ideas to follow up, for example, expanding and stacking thunks 16 at a time, so 5 REDUCE steps would process an entire Excel column. The challenge is to both achieve acceptable efficiency and to hide the unsellable contortions from the end user!
- karimbaezCopper Contributor
HiTheDub try this method that I found on this youtube video https://www.youtube.com/watch?v=ITg5lfm3d14. it uses the MAKEARRAY function.
Replace this with your array "CHAR(RANDARRAY(10,10,65,90,TRUE))"
=LET( x,CHAR(RANDARRAY(10,10,65,90,TRUE)), MAKEARRAY(ROWS(x),COLUMNS(x), LAMBDA(r,c,CONCAT(INDEX(x,r,SEQUENCE(c))))))
- PeterBartholomew1Silver Contributor
I created another approach to the problem in which many rows are stepped through the SCAN as a block (by again using thunks) and then displayed using MAKEARRAY.
= ExpandVλ(SCAN(,testdataVϑ,ConcatThunkλ)) Where ConcatThunkλ = LAMBDA(aϑ,bϑ,Thunkλ(aϑ()&bϑ())); testdataVϑ = BYCOL(testdata, Thunkλ); ExpandVλ = LAMBDA(dϑ, MAKEARRAY(n,10, LAMBDA(r,c, INDEX(INDEX(dϑ,1,c)(),r,1))) );
I then performed some timing runs using 10 columns but a varying number of rows.
What is troubling is that both this solution and the earlier one are of Order 1.7 (I am more used to seeing integer values). The earlier REDUCE formula appears to be more efficient than that using MAKEARRAY but, either way, the growth means that the approaches are not usable once the row count goes into the 1000s (in which case the time for a recalculation exceeds 1s).
My conclusion is that the array of array problem needs to the addressed by Microsoft. Otherwise we are likely to find ourselves in a situation of falling back on copy fill which will be a real joy when resizing calculations that are running into the 100,000s of rows!
- PeterBartholomew1Silver Contributor
- lori_mSteel Contributor
Lol... until Microsoft comes up with a better alternative, I'd suggest using the SergeiBaklan formulation or something like below on longer arrays:
=REDUCE(INDEX(data,,1), SEQUENCE(COLUMNS(data)-1), LAMBDA(acc,i, HSTACK(acc, INDEX(acc,,i)&INDEX(data,,i+1) )))
Which performs better would depend on array dimensions, I'd think the method that stacks along the smaller of the row or column dimensions would be preferable. The othe advantage of thunks and the above method is that vector accumulation is supported.
BTW, previous timing results that I posted would need to be updated by removing the 'vstack' name - turns out that prototype function was slowing things down a lot.
- Starrysky1988Iron Contributor
AFE is not working in my Excel and I just use name manager.
byR = LAMBDA(L,MAKEARRAY(ROWS(L),COLUMNS(L),LAMBDA(r,c,CONCAT(INDIRECT(ADDRESS(r,1)&":"&ADDRESS(r,c))))))
The formula above will only work if the list is started from cell A1.
To get the dynamic result, I've added another helper name that represents the first cell in the list.
I'd suggest to modify a bit which allows to work both with ranges and arrays.
ByRR = LAMBDA(A, MAKEARRAY( rows(A), columns(A), LAMBDA( n, m, CONCAT( INDEX( A, n, SEQUENCE(m) ) ) ) ) );
Performance is bit better than with thunks or makearray() which uses scan().
- Starrysky1988Iron Contributor
Dear Mr. Sergei Baklan,
Thanks for your highlight and I've learnt something from it. The formula now is perfect.
What I like the most using Excel is the art of combining of different Excel Functions.
If we are using old version of Excel 2016 onwards, we can just simply use "Concat" function as below and copy and paste to the cell we want.
Sometimes, we make things easy, but sometimes we make them difficult.
May be we want to save a few second of extra work and always want dynamic result.
- lori_mSteel Contributor
Following the announcement of the new array functions, I wondered if VSTACK or some other function could be put to use in this type of situation to extend BYROW?
In the absence of a built-in function, maybe define instead:
BYROWS =LAMBDA(array,function, REDUCE(1, SEQUENCE(ROWS(array)), LAMBDA(A,i, IF(SEQUENCE(i)=i, function(INDEX(array,i,)), A))))
which could be used in conjunction with SCAN as shown...
(Out of interest I did rough timings of the various implementations on a 100x100 array)
My first attempt
reduceV= LAMBDA(array, DROP( REDUCE( "", SEQUENCE(ROWS(array)), LAMBDA(a, n, VSTACK( a, SCAN(, CHOOSEROWS(array, n), LAMBDA(acc, v, acc & v)) ) ) ), 1 ) );
which gives
Not sure how to integrate it with timer.
- lori_mSteel Contributor
Great - looks better. and likely faster too. For timing maybe replace "" by 0 (or 1) and '&' with '+' then update function name in timer formula. From the times, it looks like your system is running twice as fast! I needed to reduce to n=50 to make it responsive enough on my setup.
- PeterBartholomew1Silver Contributor
OK. So I am beginning to get fed up with the arrays of arrays problem with the 365 lambda helper functions. Isn't that what one uses spreadsheets for?
Re: merging rows - Microsoft Tech Community
Every table is a list of records. Crosstabs are arrays of arrays. I would say that up to 80% of my workbooks would benefit from being able to handle arrays of arrays efficiently.
OK, thunks do offer a way around the problem and I am getting better at using them. The attached workbook has Lambda functions Thunkλ and ExpandThunkλ that I have used to form and expand the arrays of thunks, but it is still a pain and I struggle to see how I would convince a normal Excel user that this is better than 'and copy down (ugh!)'.
- lori_mSteel Contributor
I had also tried to raise this point several times prior to release. My hope was that something like this would work here,
=SCAN({"";""},data,LAMBDA(a,b,a&b)))
which would be a more efficient implementation of:
=REDUCE("", SEQUENCE(COLUMNS(data)), LAMBDA(acc,i, IF(SEQUENCE(,i)=i, INDEX(acc,,i-1)&INDEX(data,,i), acc)))
Instead a reusable 'SCANBYROW' function can be implemented via one of the methods Peter suggested, or formula above but this is far from ideal.
- TheDubIron Contributor
As the kids nowadays say, SMH... Functions like that REDUCE() are some kind of abomination: six sub-functions (two of which are used twice), eight sets of "( )"s and an endless stream of ",". After a few of those, one's brain just shuts down 🙂
- lori_mSteel ContributorFair enough, I think many other Excellers out there would agree with such sentiments! As per Peter's earlier comments, formula construction is starting to become more the domain of the developer community - similar examples of functional code abound on stackoverflow.
The post was more to illustrate how an array of results could be generated in 2d rather than an array of 1d-arrays which is not allowed. If similar updates were made to SCAN/BYROW/etc functions one wouldn't need to resort to such headache-inducing formulas (though I'm sure the C++ source code of those functions would be many times worse!)
- PeterBartholomew1Silver Contributor
It looks like it should work and, indeed, a similar formula with REDUCE does work. Unfortunately SCAN produces an array on each row, leading to an array of arrays which hasn't been implemented within the calc engine. There are two distinct approaches to overcome the problem, neither of which is that attractive.
The first is to allow SCAN to run through the 6 values as a single sequence but to reset the resulting text to null whenever the scan returns to the first column. Rather than scanning the text array, I scanned the sequence {0,1,2;3,4,5} which can both be used to look up the text to concatenate and to identify the leading column.
= SCAN("", SEQUENCE(2,3,0), LAMBDA(str,k, LET( row, 1+QUOTIENT(k,3), column, 1+MOD(k,3), chr, INDEX(data, row, column), IF(column>1,str,"")&chr ) ) )
The second approach follows your original formula more closely but, instead of allowing SCAN to build a text array, it builds a Thunk that contains the array. BYROW then creates an array of 2 thunks and if the result is passed to MAKEARRAY each thunk may be singled out by INDEX, then expanded by passing it a null parameter string and wrapped within a further INDEX, which will return a single value to be assembled into the solution array.
Thunkλ = LAMBDA(x,LAMBDA(x)) Forming a thunk: = LET( arrayϑ, Thunkλ(array), Returning its content: = arrayϑ()
I will leave it to you to judge whether the most appropriate answer to your question is 'yes' or 'no'!
- TheDubIron ContributorFirst - and obviously - thanks for you thoughtful reply. I'll play with both options against other examples, but my gut feeling is that #2 will come up ahead.
Second - and this is sort of a general rant - having played around with many of the new shiny toys recently introduced to Excel, I must say the whole process feels strange, like "two steps forward - one step back". This one is a great example of Excel being inconsistent in applying its own logic: if something can be done by row (or by column) it should be always doable that way, instead of letting users find out (and memorizing) the circumstances where the logic works and where it doesn't. When you say, about the two approaches (in this case) to overcoming the problem, that "neither of which is that attractive" you are being, I feel, very polite. The near constant need to look for "alternatives", "workarounds", "hacks", etc. to resolve issues which should be straightforward is utterly exhausting....- PeterBartholomew1Silver Contributor
I see your point. The best I can say is that the pitfalls are relatively few.
A bit like traditional Excel where one tries to explain why
= COUNTIFS(MONTH(date),6))
doesn't work. Because CONTIFS doesn't accept array parameters in place of a range reference always sounded a bit lame.
In the present case, I got to
= MAKEARRAY(2,3, Expandλ( BYROW(data, LAMBDA(row,Thunkλ(SCAN("", row, Concatλ)))) )) where Expandλ = LAMBDA(ϑ, LAMBDA(r,c, INDEX(INDEX(ϑ, r,1)(),,c) )); Concatλ = LAMBDA(array,cell, array&cell);
It is heavy going but, once done, it can be reused. Hopefully, not too often!