Forum Discussion
Selecting Data with Minimum Standard Deviation
- Oct 19, 2022
In the attached version, I have implemented my VBA algorithm (with StDev_P instead of StDev_S) as a custom function. It is used in an array formula confirmed with Ctrl+Shift+Enter to return the selected values.
alvaro037 Here is an alternate solution. I reduce the number of trials because we know std will be lowest for numbers close together so I sort the numbers and then increment through the list:
=LET(in,B2:K2,
n,5,
sortin, SORT(TRANSPOSE(in)),
start, SEQUENCE(ROWS(sortin)-n),
stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))),
findLow,MATCH(MIN(stdlist),stdlist,0),
INDEX(sortin,start+findLow-1))
see attached
EDIT:
a) please note my SD calculation used STD.P per Joe's suggestion above while Hans' uses STD.S and my solution found a set starting with 8.6-8.9 and Hans' found 8.7-9.0 but the SD is the same in each case.
b) here is an alternate version of the formula that stacks/includes the AVG and SD:
=LET(in,B2:K2,
n,5,
sortin, SORT(TRANSPOSE(in)),
start, SEQUENCE(ROWS(sortin)-n),
stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))),
findLow,MATCH(MIN(stdlist),stdlist,0),
ans,INDEX(sortin,start+findLow-1),
out,VSTACK(ans,"Avg",AVERAGE(ans),"SD",STDEV.P(ans)),
out)
I prefer to have them separate as in the attached file but that is up to you
mtarler wrote: ``we know std will be lowest for numbers close together``
My version of Excel does not support those features.
Which subset of 5 is chosen with the following set of data:
1.1
1.3
1.5
1.7
1.9
100.1
100.2
100.3
100.4
100.5
The correct subset is 100.1 through 100.5 (sd = 0.141421356237312), not 1.1 through 1.9 (sd = 0.282842712474619).
And just for fun, which subset is chosen if the first 5 numbers are 1.1, 1.2, 1.3, 1.4 and 1.5?
In that case, the sd is the same.
- mtarlerOct 19, 2022Silver Contributormy comment was purely to explain why I sort the numbers first and can then increment through that sorted list instead of checking every combination. i.e. only check N-5 (i.e. 5) instead of 252 cases.
As for the answers, it pointed out an error in my formula (i.e. it was offset by 1 and didn't check the first entry correctly) so I edited the above post to have the correction.
That said, my formula finds the first set with the minimum SD while Hans' find a correct set but not necessarily consistent which.