SOLVED

Can SCAN() and BYROW() be combined?

Brass Contributor

The support page for SCAN() contains the following example: 

914422ab-f843-49ba-b0de-082385f9cd11

                                      =SCAN("",A1:C2,LAMBDA(a,b,a&b))

 

My general issue is combining BYROW() or BYCOLUMN() with other functions, but I'll illustrate it with this SCAN() example:

 

How does one SCAN() row by row such that the output it something like:

a, ab, abc

d, de, def

 

=BYROW(A1:C2,LAMBDA(row,SCAN("",row,LAMBDA(a,b,a&b))))

 

doesn't work...

42 Replies
best response confirmed by TheDub (Brass Contributor)
Solution

@TheDub 

It looks like it should work and, indeed, a similar formula with REDUCE does work.  Unfortunately SCAN produces an array on each row, leading to an array of arrays which hasn't been implemented within the calc engine.  There are two distinct approaches to overcome the problem, neither of which is that attractive. 

The first is to allow SCAN to run through the 6 values as a single sequence but to reset the resulting text to null whenever the scan returns to the first column.  Rather than scanning the text array, I scanned the sequence {0,1,2;3,4,5} which can both be used to look up the text to concatenate and to identify the leading column.

= SCAN("", SEQUENCE(2,3,0),
   LAMBDA(str,k,
      LET(
         row, 1+QUOTIENT(k,3),
         column, 1+MOD(k,3),
         chr, INDEX(data, row, column),
         IF(column>1,str,"")&chr
      )
   )
)

The second approach follows your original formula more closely but, instead of allowing SCAN to build a text array, it builds a Thunk that contains the array.  BYROW then creates an array of 2 thunks and if the result is passed to MAKEARRAY each thunk may be singled out by INDEX, then expanded by passing it a null parameter string and wrapped within a further INDEX, which will return a single value to be assembled into the solution array.

Thunkλ
= LAMBDA(x,LAMBDA(x))

Forming a thunk:
= LET(
    arrayϑ, Thunkλ(array),

Returning its content:
= arrayϑ()

I will leave it to you to judge whether the most appropriate answer to your question is 'yes' or 'no'!

 

First - and obviously - thanks for you thoughtful reply. I'll play with both options against other examples, but my gut feeling is that #2 will come up ahead.

Second - and this is sort of a general rant - having played around with many of the new shiny toys recently introduced to Excel, I must say the whole process feels strange, like "two steps forward - one step back". This one is a great example of Excel being inconsistent in applying its own logic: if something can be done by row (or by column) it should be always doable that way, instead of letting users find out (and memorizing) the circumstances where the logic works and where it doesn't. When you say, about the two approaches (in this case) to overcoming the problem, that "neither of which is that attractive" you are being, I feel, very polite. The near constant need to look for "alternatives", "workarounds", "hacks", etc. to resolve issues which should be straightforward is utterly exhausting....

@TheDub 

I see your point.  The best I can say is that the pitfalls are relatively few. 

A bit like traditional Excel where one tries to explain why

= COUNTIFS(MONTH(date),6))

doesn't work.  Because CONTIFS doesn't accept array parameters in place of a range reference always sounded a bit lame.

 

In the present case, I got to

= MAKEARRAY(2,3, Expandλ(
      BYROW(data, LAMBDA(row,Thunkλ(SCAN("", row, Concatλ))))
   ))

where
Expandλ
= LAMBDA(ϑ, 
    LAMBDA(r,c, 
        INDEX(INDEX(ϑ, r,1)(),,c)
    ));

Concatλ
= LAMBDA(array,cell, array&cell);

It is heavy going but, once done, it can be reused.  Hopefully, not too often!

it seems like your first concept was a cleaner method. Use scan and sequence to perform the scan and a 'reset' for each row something like:

=LET(in,A2:C3,
     r,ROWS(in),
     c,COLUMNS(in),
     s,TEXT(SEQUENCE(r,c,0),"000")&in,
     SCAN("",s,LAMBDA(acc,k,
         IF(MOD(--LEFT(k,3),c),acc,"")&MID(k,4,999))
         )
     )


ok after trying the original idea of using an index() inside the lambda of the original in I got the nested error and had to modify to this 'less attractive' version of concatenating the 'index array' with the 'array of values'.

Sorry for the edit, I made a few tweaks to make it more generalized.

@mtarler 

I still find this class of problem intensely difficult to get my head round and I am frequently guilty of 'reinventing the wheel'.  Unless the solutions can be packaged in the form of meaningful Lambda functions, I suspect the methods lie well beyond the pay grade of the normal Excel developer - and that bearing in mind the assertion of the ICAEW that fewer than 1 in a 1000 Excel users would rate as a developer!

 

All sorts of thoughts float around.  For example, the accumulation may be by concatenation (text), by addition (numbers) or even by multiplication (indices of multidimensional arrays).  Perhaps the operator itself should be a Lambda variable passed into the calculation framework as a parameter (Concatenateλ, Sumλ, Productλ).

 

You were unhappy with the requirement to concatenate the 'index array' with the 'array of values'.  To the best of my knowledge, there is no perfect solution.  For text, a comma-separated list could be used to hold the index/value pair, for numbers, it could be a complex number.  A thunk provides a reasonably standard, if not simple, approach and I think a more normal Lambda function would also be possible, whereby Pairλ(1) is the index and Pairλ(2) the value (INDEX would be inside the Lambda function but the function itself can be passed as a scalar assuming the parameter is missing).

 

Something else I forgot when identifying the possible approaches, was that the second approach of thunks has two distinct implementations.  One builds a column of thunks, each containing an accumulated row, whilst the other builds a row of thunks, each containing a column array.

 

BTW.  My use of 'λ' within the naming might cause confusion for many, who do not use the Greek alphabet to express ideas, but name manager accepts the symbol as just another letter within the name.  Since it is not present as a character on an English language keyboard, I use Autocorrect to convert '\lambda' into 'λ', rather like the equation editor in Word.

As far as I can tell, I am not one of the 10,000 Excel users who rate as a developer by the ICAEW (referred to in your other answer), but I AM a long time, heavy user of Excel so I feel qualified to offer my two cents on this:

 

Having studied your approaches more closely, I have to say that I now tend to prefer your first approach (designated as "Single sequence" in your attachment). Still, the approach involves using 8 different functions (SCAN, SEQUENCE, LABMDA, LET, QUOTIENT, MOD, INDEX and IF)!

The solution proposed by @mtarler - which also works - involves even more (LET,  ROWS, COLUMNS, TEXT,  SEQUENCE, SCAN, LAMBDA, IF, MOD, LEFT and MID) - 11 in total!

 

Just eyeballing these solutions (let alone parsing them) would give your run-of-the-mill non-developer user a headache...

 

What I imagine that type of user would do is just give up on spillable purity, use instead

 

 

 

=SCAN("",A1:C1,LAMBDA(a,b,a&b))

 

 

 

and then drag it down one row - and be done with the whole thing.

@TheDub 

It was as many as 1 in a 1,000 that met the ICAEW criteria of Excel knowledge!

Anyway, I think you are grossly underestimating your own ability.  If it is any consolation, from my initial assessment of your question, I had categorized you as an expert user. 

 

In this instance, I would have some sympathy with the decision to fall back on relative referencing though, since I never use direct cell references (other than within Name Manager to define the Names), I might have

= SCAN("",currentRow, LAMBDA(a,b,a&b))

Where I might put the effort in generate a dynamic array is when there is a follow-up question of "How many more times does the substring "dog" appear within the 5000 rows than "cat".  Another factor might be whether the entire model other than that single concatenation array were dynamic.

 

The question might be, "If someone were to offer you a Lambda function

= HAccumulateλ(initialisationColumn,arrayToAccumulate,Fnλ)

would you use it?"

@TheDub 

Found it!  This discussion  covers much the same ground as one I initiated in the past. 

Accumulating arrays - Microsoft Tech Community

I did have to create 'ConcatThunkλ' by substituting a '+' by '&'. in order to apply the previous function to the new situation!

= AccumulateThunkλ(charArray, charInit, ConcatThunkλ)


where "ConcatThunkλ" 
= LAMBDA(accϑ, strϑ, Thunkλ(accϑ() & strϑ()));

 

 

Well, I'm still trying to parse your function for charArray - I can already see that it involves a bunch of nested lambdas which in turn contain quite a number of functions (BYCOL, SCAN, MAKEARRAY, etc.)...

 

I don't think I'm qualified to have an opinion about whether Excel is now Turing-complete or not (whatever that practically means), but this seems a lot of work for something that in Python, for example, can be accomplished in exactly two lines, one of which is an import statement.

 

BTW, I actually did see and read the discussion you link to - but didn't make the connection; I believe I was (and probably still am) so annoyed by the failure of BYROW to work with SCAN, I couldn't see what was right in front of me!

@TheDub 

The is no need to convince me that the limitation on array of arrays/ranges is both over-restrictive and is now causing difficulties.  I have made such points in the past and @JoeMcDaid explained that Excel never has accepted such data objects; it is just that now one is far more likely to encounter this limitation.

 

At a more mundane level, one can see that

= XLOOKUP(lookupVals,Table1[Name],Table1)

fails, with the output truncated, whilst

= INDEX(
     Table1, 
     XMATCH(lookupVals,Table1[Name]),
     XMATCH(Table1[#Headers],Table1[#Headers])
  )

works, so the problem has earlier manifestations.   What Joe would need is a compelling case for Microsoft to commit resources to the problem and confidence that any change will not cause backward compatibility issues.  My feeling is that it may be sufficient to deal with the problem within the restricted context of the BYROW, BYCOL and MAKEARRAY operators, in order to avoid wider unintended consequences.  After all, for years, SUMPRODUCT (and, I think, AGGREGATE?) broke the rules and allowed array calculation without needing CSE to override implicit intersection, so the idea of exempting specific functions from the rules is not without precedent. 

 

The other thing you need is real-world examples which have been frustrated by the failure of the new helper functions to perform the tasks that they obviously should be capable of.  I think 'send a frown' gets attention, especially if key words channel the query to the correct team.

 

As for the Python 'include', I would prefer to wait and see a solid level on performance at the level of program abstraction currently available to Excel developers before considering even more abstract layers of helper functions.

 

What I am currently doing is to examine the limitations and workarounds, as well as the new opportunities, trying to determine which routes provide best practice.  Nobody would be more pleased than I, if the need for 'too clever by half' solutions were suddenly to disappear.  At least, the practice of building solutions with Lambda functions can conceal the detail and leave more coherent code at user level.

 

@TheDub 

I had also tried to raise this point several times prior to release.  My hope was that something like this would work here,

=SCAN({"";""},data,LAMBDA(a,b,a&b)))

which would be a more efficient implementation of:

=REDUCE("",
     SEQUENCE(COLUMNS(data)),
        LAMBDA(acc,i,
           IF(SEQUENCE(,i)=i,
              INDEX(acc,,i-1)&INDEX(data,,i),
              acc)))

Instead a reusable 'SCANBYROW' function can be implemented via one of the methods Peter suggested, or formula above but this is far from ideal.

As the kids nowadays say, SMH... Functions like that REDUCE() are some kind of abomination: six sub-functions (two of which are used twice), eight sets of "( )"s and an endless stream of ",". After a few of those, one's brain just shuts down :)

While I understand (I hope) what you are saying and even agree with most of it, I'd like to respond to two statements you make:


the idea of exempting specific functions from the rules is not without precedent.

While I appreciate the power of precedent, the fact that a bad idea is not without precedent doesn't necessarily turn the idea from bad to good.


 What Joe would need is ... confidence that any change will not cause backward compatibility issues.

I'm afraid that ship has already sailed. As a matter of fact, it's not too farfetched to imagine that Excel will one day be forked into incompatible Excel Classic and Excel New Age...

Fair enough, I think many other Excellers out there would agree with such sentiments! As per Peter's earlier comments, formula construction is starting to become more the domain of the developer community - similar examples of functional code abound on stackoverflow.

The post was more to illustrate how an array of results could be generated in 2d rather than an array of 1d-arrays which is not allowed. If similar updates were made to SCAN/BYROW/etc functions one wouldn't need to resort to such headache-inducing formulas (though I'm sure the C++ source code of those functions would be many times worse!)

so i tried to follow the discussion. Can someone maybe describe what that Thunk function is/does. It appears to be just a way to pass an array without it knowing it is passing an array? maybe? a lambda that returns a reference to a lambda? who would have even thunk that would work?

@mtarler 

"Who would have even thunk that would work?"  Who indeed? (grin)

It wasn't immediately obvious to me that the concept could be of any use.  I sometimes think of it as the computing equivalent of the freeze dried survival meal.  In its packaged state, it doesn't look like food but add water and bring to the boil and it is reconstituted as if by magic.

 

Before making a determined effort to exploit Thunks, I had used basic Lambda functions for similar purpose with some success, for example

"Arrayλ"
= LAMBDA(k,INDEX(Array,k))

can be passed as a Lambda variable, but

= Arrayλ({1;2;3})

will return the first 3 terms of the array.  This is actually useful when one is handling multidimensional arrays where expressions like

= Arrayλ(3,1,2)

"Arrayλ"
= LAMBDA(i,j,k, INDEX(Array,i+10*(k-1), j))

will successfully use the 3 indices pick terms from the grid.

 

The Thunk is a bit more obscure

"Arrayϑ"
= LAMBDA(Array)

"Thunkλ"
= LAMBDA(x,LAMBDA(x))

"Arrayϑ"
= Thunkλ(Array)

"Worksheet formula"
= Arrayϑ()

The first term is a thunk, but used inside SCAN will return the array and not the thunk.  To do that one can define a Lambda function Thunkλ with 2 Lambdas (thanks @tboulden).  Apply it to the array and it correctly returns the Thunk.  Add a null parameter string and you magically reconstitute the array (just like the pack of "MaryJanesFarm Kettle Chili"!

@TheDub 

OK.  So I am beginning to get fed up with the arrays of arrays problem with the 365 lambda helper functions.  Isn't that what one uses spreadsheets for?

Re: merging rows - Microsoft Tech Community

Every table is a list of records.  Crosstabs are arrays of arrays.  I would say that up to 80% of my workbooks would benefit from being able to handle arrays of arrays efficiently.  

 

OK, thunks do offer a way around the problem and I am getting better at using them.  The attached workbook has Lambda functions Thunkλ and ExpandThunkλ that I have used to form and expand the arrays of thunks, but it is still a pain and I struggle to see how I would convince a normal Excel user that this is better than 'and copy down (ugh!)'.

Following the announcement of the new array functions, I wondered if VSTACK or some other function could be put to use in this type of situation to extend BYROW?

 

In the absence of a built-in function, maybe define instead:

BYROWS
=LAMBDA(array,function,
     REDUCE(1,
         SEQUENCE(ROWS(array)),
         LAMBDA(A,i,
              IF(SEQUENCE(i)=i,
                  function(INDEX(array,i,)),
                  A))))

which could be used in conjunction with SCAN as shown... 

(Out of interest I did rough timings of the various implementations on a 100x100 array)

 

Screenshot 2022-03-24 104213.png

 

 

@lori_m 

My first attempt

reduceV=
LAMBDA(array,
    DROP(
        REDUCE(
            "",
            SEQUENCE(ROWS(array)),
            LAMBDA(a, n,
                VSTACK(
                    a,
                    SCAN(, CHOOSEROWS(array, n),
                        LAMBDA(acc, v, acc & v))
                )
            )
        ),
        1
    )
);

which gives

image.png

Not sure how to integrate it with timer.

1 best response

Accepted Solutions
best response confirmed by TheDub (Brass Contributor)
Solution

@TheDub 

It looks like it should work and, indeed, a similar formula with REDUCE does work.  Unfortunately SCAN produces an array on each row, leading to an array of arrays which hasn't been implemented within the calc engine.  There are two distinct approaches to overcome the problem, neither of which is that attractive. 

The first is to allow SCAN to run through the 6 values as a single sequence but to reset the resulting text to null whenever the scan returns to the first column.  Rather than scanning the text array, I scanned the sequence {0,1,2;3,4,5} which can both be used to look up the text to concatenate and to identify the leading column.

= SCAN("", SEQUENCE(2,3,0),
   LAMBDA(str,k,
      LET(
         row, 1+QUOTIENT(k,3),
         column, 1+MOD(k,3),
         chr, INDEX(data, row, column),
         IF(column>1,str,"")&chr
      )
   )
)

The second approach follows your original formula more closely but, instead of allowing SCAN to build a text array, it builds a Thunk that contains the array.  BYROW then creates an array of 2 thunks and if the result is passed to MAKEARRAY each thunk may be singled out by INDEX, then expanded by passing it a null parameter string and wrapped within a further INDEX, which will return a single value to be assembled into the solution array.

Thunkλ
= LAMBDA(x,LAMBDA(x))

Forming a thunk:
= LET(
    arrayϑ, Thunkλ(array),

Returning its content:
= arrayϑ()

I will leave it to you to judge whether the most appropriate answer to your question is 'yes' or 'no'!

 

View solution in original post