Selecting Data with Minimum Standard Deviation

=LET(in,B2:K2, n,5, sortin, SORT(TRANSPOSE(in)), start, SEQUENCE(ROWS(sortin)-n), stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))), findLow,MATCH(MIN(stdlist),stdlist,0), INDEX(sortin,start+findLow-1))

JoeUser2004
Bronze Contributor
Oct 19, 2022
mtarler wrote:

=LET(in,B2:K2, n,5, sortin, SORT(TRANSPOSE(in)), start, SEQUENCE(ROWS(sortin)-n), stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))), findLow,MATCH(MIN(stdlist),stdlist,0), ans,INDEX(sortin,start+findLow-1), out,VSTACK(ans,"Avg",AVERAGE(ans),"SD",STDEV.P(ans)),out)

Forgive, but I don't speak the new Excel 365 language.

Please describe in one or two sentences what the code above does.

For example, does it find the contiguous n sorted data with the smallest std dev?

The operative word is "contiguous". Right?
- mtarler
  Silver Contributor
  Oct 19, 2022
  JoeUser2004 sure. and ugh I found another error in going through it because there were 10 items and looking for 5 I was using the n=5 and the 10-5 (i.e. 5) together. I tweaked the code to account for possible other combinations:
  =LET(in,B3:K3, n,5, sortin, SORT(TRANSPOSE(in)), start, SEQUENCE(ROWS(sortin)-n+1), stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,SEQUENCE(n,,0)+i)))), findLow,MATCH(MIN(stdlist),stdlist,0), ans,INDEX(sortin,SEQUENCE(n,,0)+findLow), out,VSTACK(ans,AVERAGE(ans),STDEV.P(ans)), TRANSPOSE(out))
  line 1: define the input range
  line 2: define how many items to select
  line 3: rearrange the data from in to be sorted in order
  line 4: define a sequence 1,2,3,... up to the length of the input minus N so if you have 100 inputs and selecting 5 then you will want to start at 1,2,3,..., 93,94,95,96 and you stop at 96 because after that you won't have enough left to get 5 items
  line 5: create a list of standard deviations for each group of n items starting with index 1 and going to the end of the list
  line 6: find the location in the list where the MIN value is located
  line 7: create a list of the N values used to create that MIN standard deviation value
  line 8: stack the list from 7 and add the average and the standard deviation
  
  alvaro037 please see this formula or the attached file for the corrected formula
  MinSD.xlsm21 KB
  - JoeUser2004
    Bronze Contributor
    Oct 19, 2022
    mtarler wrote:
    ``you will want to start at 1,2,3,..., 93,94,95,96 and you stop at 96 because after that you won't have enough left to get 5 items
    create a list of standard deviations for each group of n items starting with index 1 and going to the end of the list
    find the location in the list where the MIN value is located``
    
    So again, in simple terms, looking at the sorted list of data, you find the 5 contiguous data with the minimum std dev. Contiguous means that the sets of data are items {1,2,3,4,5}, {2,3,4,5,6}, {3,4,5,6,7}, etc, {94,95,96,97,98}, {95,96,97,98,99} and {96,97,98,99,100}.
    
    Is that right?
JoeUser2004
Bronze Contributor
Oct 19, 2022
mtarler wrote: ``we know std will be lowest for numbers close together``

My version of Excel does not support those features.

Which subset of 5 is chosen with the following set of data:

1.1
1.3
1.5
1.7
1.9
100.1
100.2
100.3
100.4
100.5

The correct subset is 100.1 through 100.5 (sd = 0.141421356237312), not 1.1 through 1.9 (sd = 0.282842712474619).

And just for fun, which subset is chosen if the first 5 numbers are 1.1, 1.2, 1.3, 1.4 and 1.5?

In that case, the sd is the same.
- mtarler
  Silver Contributor
  Oct 19, 2022
  my comment was purely to explain why I sort the numbers first and can then increment through that sorted list instead of checking every combination. i.e. only check N-5 (i.e. 5) instead of 252 cases.
  As for the answers, it pointed out an error in my formula (i.e. it was offset by 1 and didn't check the first entry correctly) so I edited the above post to have the correction.
  That said, my formula finds the first set with the minimum SD while Hans' find a correct set but not necessarily consistent which.

Forum Discussion

Selecting Data with Minimum Standard Deviation

Resources