Forum Discussion
Occurrence in an excel serie
Hello,
I have an excel column filled with 0 and 1.
I would like to count the number of successive 1 and 0
Ex: 011001011100000
The result could be something like 2 columns, I. E the first for 0 and the second for 1.
First result column : 1 2 1 5
Second result column : 2 1 3
Thanks in advance
10 Replies
- peiyezhuBronze Contributor
By sql:
cli_add_php~~$GLOBALS['f']=function($origValues){
preg_match_all('{0+}',$origValues,$arrZero);
$arrZeroCount=array_map(function($v){
return strlen($v);
},$arrZero[0]);
$zeroCount=implode(' ',$arrZeroCount);
preg_match_all('{1+}',$origValues,$arrZero);
$arrZeroCount=array_map(function($v){
return strlen($v);
},$arrZero[0]);
$zeroCount.='</td><td>'.implode(' ',$arrZeroCount);
return $zeroCount;
};
~;
select *,udf_run_php(F_A,'$a=$GLOBALS["f"]($origValues);') `0</th><th>1` from Occurrence_in_an_excel_serie;
- JosWoolleyIron Contributor
Looking at the replies thus far, for all their ingenuity I can't help but think that sometimes the new dynamic array formulas entice us so much as to make us lose sight of simpler set-ups. What's more, iterating functions such as SCAN, BYROW and REDUCE can be expensive, and text-splitting and text-combining operations seem largely inappropriate for a column of numeric ones and zeroes.
Given that the OP is in any case seeking a fixed number of columns in the output (one for ones and one for zeroes), I see no great need for a single spill formula which fills that static number of columns. I would prefer simply:
=LET( ζ, A1:A15, ξ, FREQUENCY(IF(ζ = 0, ROW(ζ)), IF(ζ <> 0, ROW(ζ))), FILTER(ξ, ξ) )
which can then be duplicated in the next column, changing the 0 to a 1.
Regards
- PeterBartholomew1Silver Contributor
As someone that has moved 100% to the new Excel, I can still see some sense in what you say. I tend to look at Excel as two computing environments that share a common function library. I am eager to dispense with everything that characterises the legacy spreadsheet because I never liked it anyway. The question of interest is then one of whether the new methods have matured sufficiently to offer a complete replacement for tried-and-tested wisdom of the ages.
My first approach was to use SCAN as a simple accumulator
= LET( offset, DROP(VSTACK(-1,data),-1), start?, data<>offset, end?, DROP(VSTACK(start?, TRUE),1), accumulation, SCAN(0, start?, LAMBDA(a,b, IF(b=TRUE, 1, a+1))), frequencies, FILTER(accumulation, end?), frequencies )
The next exploits the FREQUENCY function, though not in exactly the same manner as you.
= LET( index, SEQUENCE(15), offset, DROP(VSTACK(data,-1),1), end?, data<>offset, bins, FILTER(index, end?), freq, FREQUENCY(index, bins), DROP(freq, -1) )
I will have to consider your use of ξ, η, ζ as variable names, especially since ξ1, η1, ζ1 are valid names that haven't been squandered to support the ludicrous A1 direct referencing notation!
- JosWoolleyIron Contributor
Ah, so you've decided to combine the two required outputs into a single column, in which the rows represent alternating frequency counts for ones and zeroes corresponding to the order in which those values occur in data?
I like it, especially since you could then, if desired, wrap that in WRAPROWS to achieve the desired two-column output.
Regards
- davidlealIron Contributor
Schuss27 it is not clear the input format (an array or just a single cell) and the output. If the input is in column-array, you can try the following:
=LET(A,A1:A15, incl, 0,excl,1, SPLIT,LAMBDA(arr,incl,excl, BYROW(1*TEXTSPLIT(TEXTJOIN(",",,A),",",excl,1), LAMBDA(x,COUNT(x)))), VSTACK({"Zeros","Ones"}, IFERROR(HSTACK(SPLIT(A,incl, excl),SPLIT(A,excl,incl)),"")))
it relays in a user LAMBDA function SPLIT, which does the magic. Basically it accommodates the input in a way we can build a 2D-array. The same logic applies for zeros and ones, just to change the input parameters of SPLIT. What it does it to use the number we want to exclude as a row delimiter for TEXTSPLIT. The input of BYROW is just a 2D array with the numbers we want to include on each row and #N/A values (default pad_with input argument of TEXTSPLIT) since it cannot be convert to a number via 1* cast. Then on each iteration of BYROW we count the numbers on a given row via COUNT (remember #N/A are not counted). Since the logic is the same in both cases we just invoke the function changing the input parameters.
Here is the output:
If the input is a string in a single cell, the formula can be adjusted as follows:
=LET(x,A1, incl,0,excl,1, SPLIT,LAMBDA(x,incl,excl, BYROW(1*TEXTSPLIT(SUBSTITUTE(x,incl,incl&","),",",excl,1), LAMBDA(y,COUNT(y)))), VSTACK({"Zeros","Ones"}, IFERROR(HSTACK(SPLIT(x,incl, excl),SPLIT(x,excl,incl)),"")))
There are other ways to do it, but it will probably result in a larger formula. This is a practical one. For example using SCAN to count repeated items within each group, identify start and end of each group and calculate maximum count for each start/end interval:
=LET(A,A1, incl1, 0,incl2,1, CALC, LAMBDA(x,y, LET(n, ROWS(A), seq, SEQUENCE(n), cnts,SCAN(0,A, LAMBDA(ac,x, IF(x=y, ac+1,0))), start, FILTER(seq, 1-ISNA(XMATCH(cnts,1)),1), end, IF(ROWS(start)=1,n, VSTACK(DROP(start,1)-1, n)), MAP(SEQUENCE(ROWS(start)), LAMBDA(i, LET(s,INDEX(start,i), e,INDEX(end,i), MAX(FILTER(cnts, (seq >=s) * (seq<=e) ))))))), IFERROR(VSTACK({"Zeros","Ones"}, HSTACK(CALC(A,incl1), CALC(A,incl2))),""))
Considering mtarler simplification from TEXTSPLIT approach, the formula using a single cell as input can be simplified as follows:
=LET(x,G1, excl1,1, excl2,0, SPLIT,LAMBDA(x,y,LEN(TEXTSPLIT(x,,y,1))), VSTACK({"Zeros","Ones"}, IFERROR(HSTACK(SPLIT(x,excl1),SPLIT(x,excl2)),"")))
For an input array, the previous formula can be used, just using TEXTJOIN to convert the input arrray into a string as follows:
=LET(x,TEXTJOIN("",,A1:A15),excl1,1, excl2,0, SPLIT,LAMBDA(x,y,LEN(TEXTSPLIT(x,,y,1))), VSTACK({"Zeros","Ones"}, IFERROR(HSTACK(SPLIT(x,excl1),SPLIT(x,excl2)),"")))
I hope it helps,
David
- mtarlerSilver Contributor
davidleal oh yea, using textsplit is a good idea. I did it different than you and allow more flexible options and error checking (in case of inputs like 111111 or other characters added like 12111)
=LET(in,A2,keys, C1:D1, arrayCounts, LAMBDA(txt,key,LET(otherTxt,SUBSTITUTE(txt,key,""),nonkeys,IFERROR(UNIQUE(MID(otherTxt,SEQUENCE(LEN(otherTxt)),1)),CHAR(CODE(key)+1)), out, IFERROR(LEN(TEXTSPLIT(in,nonkeys,,1)),""),TRANSPOSE(out))), out,DROP(REDUCE("",keys,LAMBDA(p,q, HSTACK(p,arrayCounts(in,q)))),,1), IF(ISERROR(1/out),"",out))
but that said to give Schuss27 a very very simple way it could be:
=LEN(TEXTSPLIT(A1,,"1",1))
The above will create a column for ZEROs so replace the "1" with "0" to get ONEs
- davidlealIron ContributorThat is really short, I was into that direction, but you really simplified. I actually adapted my second formula from the first version (when the input is an array), so I could not see this shortcut for the string input.
- mtarlerSilver Contributor
Schuss27 I hope to see other improvements but here is a formula that works. I put the lambda declarations inside the LET but could easily be pulled out into its own function(s). Depends if you just need this in 1 spot or all over the worksheet.
=LET(in,A1,keys, C1:D1, Num, LEN(in), SEQ, SEQUENCE(Num), arr, MID(in,SEQ,1), ArrayCounts, LAMBDA(array,key, REDUCE(0,array, LAMBDA(p,q, IF(CODE(q)=CODE(key), IFERROR(VSTACK(DROP(p,-1),TAKE(p,-1)+1),p+1),IF(INDEX(TAKE(p,-1),1),VSTACK(p,0),p))))), out,DROP(REDUCE("",keys,LAMBDA(p,q, HSTACK(p,ArrayCounts(arr,q)))),,1), IF(ISERROR(1/out),"",out))