Occurrence in an excel serie

Copper Contributor

Hello,

I have an excel column filled with 0 and 1.

I would like to count the number of successive 1 and 0

Ex: 011001011100000

The result could be something like 2 columns, I. E the first for 0 and the second for 1.

First result column : 1 2 1 5

Second result column : 2 1 3

 

Thanks in advance

10 Replies

@Schuss27  I hope to see other improvements but here is a formula that works.  I put the lambda declarations inside the LET but could easily be pulled out into its own function(s).  Depends if you just need this in 1 spot or all over the worksheet.

=LET(in,A1,keys, C1:D1, Num, LEN(in), SEQ, SEQUENCE(Num), arr, MID(in,SEQ,1),
ArrayCounts, LAMBDA(array,key, REDUCE(0,array, LAMBDA(p,q, IF(CODE(q)=CODE(key), IFERROR(VSTACK(DROP(p,-1),TAKE(p,-1)+1),p+1),IF(INDEX(TAKE(p,-1),1),VSTACK(p,0),p))))),
out,DROP(REDUCE("",keys,LAMBDA(p,q, HSTACK(p,ArrayCounts(arr,q)))),,1),
IF(ISERROR(1/out),"",out))

@Schuss27  it is not clear the input format (an array or just a single cell) and the output. If the input is in column-array, you can try the following:

 

=LET(A,A1:A15, incl, 0,excl,1, SPLIT,LAMBDA(arr,incl,excl, 
 BYROW(1*TEXTSPLIT(TEXTJOIN(",",,A),",",excl,1), 
  LAMBDA(x,COUNT(x)))), VSTACK({"Zeros","Ones"}, 
 IFERROR(HSTACK(SPLIT(A,incl, excl),SPLIT(A,excl,incl)),"")))

 

it relays in a user LAMBDA function SPLIT, which does the magic. Basically it accommodates the input in a way we can build a 2D-array. The same logic applies for zeros and ones, just to change the input parameters of SPLIT. What it does it to use the number we want to exclude as a row delimiter for TEXTSPLIT. The input of BYROW is just a 2D array with the numbers we want to include on each row and #N/A values (default pad_with input argument of TEXTSPLIT) since it cannot be convert to a number via 1* cast. Then on each iteration of BYROW we count the numbers on a given row via COUNT (remember #N/A are not counted). Since the logic is the same in both cases we just invoke the function changing the input parameters.

 

Here is the output:

davidleal_0-1682730032603.png

 

If the input is a string in a single cell, the formula can be adjusted as follows:

 

=LET(x,A1, incl,0,excl,1, SPLIT,LAMBDA(x,incl,excl, 
 BYROW(1*TEXTSPLIT(SUBSTITUTE(x,incl,incl&","),",",excl,1), 
  LAMBDA(y,COUNT(y)))), VSTACK({"Zeros","Ones"}, 
 IFERROR(HSTACK(SPLIT(x,incl, excl),SPLIT(x,excl,incl)),"")))

 

There are other ways to do it, but it will probably result in a larger formula. This is a practical oneFor example using SCAN to count repeated items within each group, identify start and end of each group and calculate maximum count for each start/end interval:

 

 

=LET(A,A1, incl1, 0,incl2,1, CALC, LAMBDA(x,y, LET(n, ROWS(A), seq, SEQUENCE(n),
 cnts,SCAN(0,A, LAMBDA(ac,x, IF(x=y, ac+1,0))), 
 start, FILTER(seq, 1-ISNA(XMATCH(cnts,1)),1), 
 end, IF(ROWS(start)=1,n, VSTACK(DROP(start,1)-1, n)), 
 MAP(SEQUENCE(ROWS(start)), LAMBDA(i, LET(s,INDEX(start,i), e,INDEX(end,i), 
  MAX(FILTER(cnts, (seq >=s) * (seq<=e) ))))))), 
 IFERROR(VSTACK({"Zeros","Ones"}, HSTACK(CALC(A,incl1), CALC(A,incl2))),""))

 

Considering  @mtarler simplification from TEXTSPLIT approach,  the formula using a single cell as input can be simplified as follows:

 

=LET(x,G1, excl1,1, excl2,0, SPLIT,LAMBDA(x,y,LEN(TEXTSPLIT(x,,y,1))), 
 VSTACK({"Zeros","Ones"}, 
  IFERROR(HSTACK(SPLIT(x,excl1),SPLIT(x,excl2)),"")))

 

For an input array, the previous formula can be used, just using TEXTJOIN to convert the input arrray into a string as follows:

=LET(x,TEXTJOIN("",,A1:A15),excl1,1, excl2,0,
 SPLIT,LAMBDA(x,y,LEN(TEXTSPLIT(x,,y,1))),
 VSTACK({"Zeros","Ones"},
   IFERROR(HSTACK(SPLIT(x,excl1),SPLIT(x,excl2)),"")))

 

I hope it helps,

 

David

@davidleal  oh yea, using textsplit is a good idea.  I did it different than you and allow more flexible options and error checking (in case of inputs like 111111 or other characters added like 12111)

=LET(in,A2,keys, C1:D1,
arrayCounts, LAMBDA(txt,key,LET(otherTxt,SUBSTITUTE(txt,key,""),nonkeys,IFERROR(UNIQUE(MID(otherTxt,SEQUENCE(LEN(otherTxt)),1)),CHAR(CODE(key)+1)), out, IFERROR(LEN(TEXTSPLIT(in,nonkeys,,1)),""),TRANSPOSE(out))),
out,DROP(REDUCE("",keys,LAMBDA(p,q, HSTACK(p,arrayCounts(in,q)))),,1),
IF(ISERROR(1/out),"",out))

but that said to give @Schuss27 a very very simple way it could be:

=LEN(TEXTSPLIT(A1,,"1",1))

The above will create a column for ZEROs so replace the "1" with "0" to get ONEs

That is really short, I was into that direction, but you really simplified. I actually adapted my second formula from the first version (when the input is an array), so I could not see this shortcut for the string input.
I credit you with the direction. I was focus on creating a generalizable solution which I still think is cool. They can handle any array of 'headers' as find keys and has most error checking in place, at least what I thought of as 'edge cases'. But that is what is great here, we can help/push each other and give lots of ways to do the same task.

@Schuss27 

 

Looking at the replies thus far, for all their ingenuity I can't help but think that sometimes the new dynamic array formulas entice us so much as to make us lose sight of simpler set-ups. What's more, iterating functions such as SCAN, BYROW and REDUCE can be expensive, and text-splitting and text-combining operations seem largely inappropriate for a column of numeric ones and zeroes.

 

Given that the OP is in any case seeking a fixed number of columns in the output (one for ones and one for zeroes), I see no great need for a single spill formula which fills that static number of columns. I would prefer simply:

 

=LET(
    ζ, A1:A15,
    ξ, FREQUENCY(IF(ζ = 0, ROW(ζ)), IF(ζ <> 0, ROW(ζ))),
    FILTER(ξ, ξ)
)

 

which can then be duplicated in the next column, changing the 0 to a 1.

 

Regards

@JosWoolley 

As someone that has moved 100% to the new Excel, I can still see some sense in what you say.  I tend to look at Excel as two computing environments that share a common function library.  I am eager to dispense with everything that characterises the legacy spreadsheet because I never liked it anyway.  The question of interest is then one of whether the new methods have matured sufficiently to offer a complete replacement for tried-and-tested wisdom of the ages.

 

My first approach was to use SCAN as a simple accumulator

= LET(
    offset,       DROP(VSTACK(-1,data),-1),
    start?,       data<>offset,
    end?,         DROP(VSTACK(start?, TRUE),1),
    accumulation, SCAN(0, start?, LAMBDA(a,b, IF(b=TRUE, 1, a+1))),
    frequencies,  FILTER(accumulation, end?),
    frequencies
  )

The next exploits the FREQUENCY function, though not in exactly the same manner as you.

= LET(
    index,  SEQUENCE(15),
    offset, DROP(VSTACK(data,-1),1),
    end?,   data<>offset,
    bins,   FILTER(index, end?),
    freq,   FREQUENCY(index, bins),
    DROP(freq, -1)
  )

I will have to consider your use of  ξ, η, ζ  as variable names, especially since ξ1, η1, ζ1 are valid names that haven't been squandered to support the ludicrous A1 direct referencing notation!

 

@Peter Bartholomew 

 

Ah, so you've decided to combine the two required outputs into a single column, in which the rows represent alternating frequency counts for ones and zeroes corresponding to the order in which those values occur in data?

 

I like it, especially since you could then, if desired, wrap that in WRAPROWS to achieve the desired two-column output.

 

Regards

@JosWoolley 

Agreed; I should have included WRAPROWS in the final statement

    WRAPROWS(DROP(freq,-1), 2, "")

 

By sql:

 

cli_add_php~~$GLOBALS['f']=function($origValues){

preg_match_all('{0+}',$origValues,$arrZero);

$arrZeroCount=array_map(function($v){

return strlen($v);

},$arrZero[0]);

$zeroCount=implode(' ',$arrZeroCount);

preg_match_all('{1+}',$origValues,$arrZero);

$arrZeroCount=array_map(function($v){

return strlen($v);

},$arrZero[0]);

$zeroCount.='</td><td>'.implode(' ',$arrZeroCount);

return $zeroCount;

};

~;

select *,udf_run_php(F_A,'$a=$GLOBALS["f"]($origValues);') `0</th><th>1` from Occurrence_in_an_excel_serie;

 

Screenshot_2023-04-30-19-07-24-630_cn.uujian.browser.jpg