Forum Discussion
Selecting Data with Minimum Standard Deviation
- Oct 19, 2022
In the attached version, I have implemented my VBA algorithm (with StDev_P instead of StDev_S) as a custom function. It is used in an array formula confirmed with Ctrl+Shift+Enter to return the selected values.
alvaro037 Here is an alternate solution. I reduce the number of trials because we know std will be lowest for numbers close together so I sort the numbers and then increment through the list:
=LET(in,B2:K2,
n,5,
sortin, SORT(TRANSPOSE(in)),
start, SEQUENCE(ROWS(sortin)-n),
stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))),
findLow,MATCH(MIN(stdlist),stdlist,0),
INDEX(sortin,start+findLow-1))
see attached
EDIT:
a) please note my SD calculation used STD.P per Joe's suggestion above while Hans' uses STD.S and my solution found a set starting with 8.6-8.9 and Hans' found 8.7-9.0 but the SD is the same in each case.
b) here is an alternate version of the formula that stacks/includes the AVG and SD:
=LET(in,B2:K2,
n,5,
sortin, SORT(TRANSPOSE(in)),
start, SEQUENCE(ROWS(sortin)-n),
stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))),
findLow,MATCH(MIN(stdlist),stdlist,0),
ans,INDEX(sortin,start+findLow-1),
out,VSTACK(ans,"Avg",AVERAGE(ans),"SD",STDEV.P(ans)),
out)
I prefer to have them separate as in the attached file but that is up to you
mtarler wrote:
=LET(in,B2:K2,
n,5,
sortin, SORT(TRANSPOSE(in)),
start, SEQUENCE(ROWS(sortin)-n),
stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))),
findLow,MATCH(MIN(stdlist),stdlist,0),
ans,INDEX(sortin,start+findLow-1),
out,VSTACK(ans,"Avg",AVERAGE(ans),"SD",STDEV.P(ans)),out)
Forgive, but I don't speak the new Excel 365 language.
Please describe in one or two sentences what the code above does.
For example, does it find the contiguous n sorted data with the smallest std dev?
The operative word is "contiguous". Right?
- mtarlerOct 19, 2022Silver Contributor
JoeUser2004 sure. and ugh I found another error in going through it because there were 10 items and looking for 5 I was using the n=5 and the 10-5 (i.e. 5) together. I tweaked the code to account for possible other combinations:
=LET(in,B3:K3, n,5, sortin, SORT(TRANSPOSE(in)), start, SEQUENCE(ROWS(sortin)-n+1), stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,SEQUENCE(n,,0)+i)))), findLow,MATCH(MIN(stdlist),stdlist,0), ans,INDEX(sortin,SEQUENCE(n,,0)+findLow), out,VSTACK(ans,AVERAGE(ans),STDEV.P(ans)), TRANSPOSE(out))
line 1: define the input range
line 2: define how many items to select
line 3: rearrange the data from in to be sorted in order
line 4: define a sequence 1,2,3,... up to the length of the input minus N so if you have 100 inputs and selecting 5 then you will want to start at 1,2,3,..., 93,94,95,96 and you stop at 96 because after that you won't have enough left to get 5 items
line 5: create a list of standard deviations for each group of n items starting with index 1 and going to the end of the list
line 6: find the location in the list where the MIN value is located
line 7: create a list of the N values used to create that MIN standard deviation value
line 8: stack the list from 7 and add the average and the standard deviation
alvaro037 please see this formula or the attached file for the corrected formula
- JoeUser2004Oct 19, 2022Bronze Contributor
mtarler wrote:
``you will want to start at 1,2,3,..., 93,94,95,96 and you stop at 96 because after that you won't have enough left to get 5 items
create a list of standard deviations for each group of n items starting with index 1 and going to the end of the list
find the location in the list where the MIN value is located``
So again, in simple terms, looking at the sorted list of data, you find the 5 contiguous data with the minimum std dev. Contiguous means that the sets of data are items {1,2,3,4,5}, {2,3,4,5,6}, {3,4,5,6,7}, etc, {94,95,96,97,98}, {95,96,97,98,99} and {96,97,98,99,100}.
Is that right?
- mtarlerOct 19, 2022Silver Contributorcorrect