## Forum Discussion

alvaro037

Oct 19, 2022Copper Contributor

# Selecting Data with Minimum Standard Deviation

Hi everyone, I'm working on a project but the data collected is more than needed, so I need to choose data with the minimum deviation or least error. Example : Thickness of material A with 10x ...

- Oct 19, 2022
In the attached version, I have implemented my VBA algorithm (with StDev_P instead of StDev_S) as a custom function. It is used in an array formula confirmed with Ctrl+Shift+Enter to return the selected values.

mtarler

Silver Contributor

alvaro037 Here is an alternate solution. I reduce the number of trials because we know std will be lowest for numbers close together so I sort the numbers and then increment through the list:

```
=LET(in,B2:K2,
n,5,
sortin, SORT(TRANSPOSE(in)),
start, SEQUENCE(ROWS(sortin)-n),
stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))),
findLow,MATCH(MIN(stdlist),stdlist,0),
INDEX(sortin,start+findLow-1))
```

see attached

EDIT:

a) please note my SD calculation used STD.P per Joe's suggestion above while Hans' uses STD.S and my solution found a set starting with 8.6-8.9 and Hans' found 8.7-9.0 but the SD is the same in each case.

b) here is an alternate version of the formula that stacks/includes the AVG and SD:

```
=LET(in,B2:K2,
n,5,
sortin, SORT(TRANSPOSE(in)),
start, SEQUENCE(ROWS(sortin)-n),
stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))),
findLow,MATCH(MIN(stdlist),stdlist,0),
ans,INDEX(sortin,start+findLow-1),
out,VSTACK(ans,"Avg",AVERAGE(ans),"SD",STDEV.P(ans)),
out)
```

I prefer to have them separate as in the attached file but that is up to you

JoeUser2004

Oct 19, 2022Bronze Contributor

mtarler wrote:

```
=LET(in,B2:K2,
n,5,
sortin, SORT(TRANSPOSE(in)),
start, SEQUENCE(ROWS(sortin)-n),
stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))),
findLow,MATCH(MIN(stdlist),stdlist,0),
ans,INDEX(sortin,start+findLow-1),
out,VSTACK(ans,"Avg",AVERAGE(ans),"SD",STDEV.P(ans)),out)
```

Forgive, but I don't speak the new Excel 365 language.

Please describe in one or two sentences what the code above does.

For example, does it find the __ contiguous__ n sorted data with the smallest std dev?

The operative word is "contiguous". Right?

- mtarlerOct 19, 2022Silver Contributor
JoeUser2004 sure. and ugh I found another error in going through it because there were 10 items and looking for 5 I was using the n=5 and the 10-5 (i.e. 5) together. I tweaked the code to account for possible other combinations:

`=LET(in,B3:K3, n,5, sortin, SORT(TRANSPOSE(in)), start, SEQUENCE(ROWS(sortin)-n+1), stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,SEQUENCE(n,,0)+i)))), findLow,MATCH(MIN(stdlist),stdlist,0), ans,INDEX(sortin,SEQUENCE(n,,0)+findLow), out,VSTACK(ans,AVERAGE(ans),STDEV.P(ans)), TRANSPOSE(out))`

line 1: define the input range

line 2: define how many items to select

line 3: rearrange the data from in to be sorted in order

line 4: define a sequence 1,2,3,... up to the length of the input minus N so if you have 100 inputs and selecting 5 then you will want to start at 1,2,3,..., 93,94,95,96 and you stop at 96 because after that you won't have enough left to get 5 items

line 5: create a list of standard deviations for each group of n items starting with index 1 and going to the end of the list

line 6: find the location in the list where the MIN value is located

line 7: create a list of the N values used to create that MIN standard deviation value

line 8: stack the list from 7 and add the average and the standard deviation

alvaro037 please see this formula or the attached file for the corrected formula

- JoeUser2004Oct 19, 2022Bronze Contributor
mtarler wrote:

``you will want to start at 1,2,3,..., 93,94,95,96 and you stop at 96 because after that you won't have enough left to get 5 items

create a list of standard deviations for each group of n items starting with index 1 and going to the end of the list

find the location in the list where the MIN value is located``

So again,

, looking at the sorted list of data, you find the 5*in simple terms*data with the minimum std dev. Contiguous means that the sets of data are items {1,2,3,4,5}, {2,3,4,5,6}, {3,4,5,6,7}, etc, {94,95,96,97,98}, {95,96,97,98,99} and {96,97,98,99,100}.*contiguous*Is that right?

- mtarlerOct 19, 2022Silver Contributorcorrect