Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

SOLVED
Home
#
Selecting Data with Minimum Standard Deviation

- Home
- Microsoft Excel
- Excel
- Re: Selecting Data with Minimum Standard Deviation

- Subscribe to RSS Feed
- Mark Discussion as New
- Mark Discussion as Read
- Pin this Discussion for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 01:20 AM

Hi everyone,

I'm working on a project but the data collected is more than needed, so I need to choose data with the minimum deviation or least error.

Example :

Thickness of material A with 10x measurement is

A = 8.7 8.9 9.0 9.5 8.4 8.1 8.6 8.7 8.9 9.1

From this I need to select 5 of them with minimum deviation, its AVG and show the data selected. Is there any formula for the selection or any approaches should I take?

Thank you for the attention

Labels:

16 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 01:42 AM

Can you attach a sample file with some of your data (anonymized)?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 02:35 AM

Here is a brute force approach for your specific situation. It uses VBA, so you'll have to allow macros.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 07:26 AM

Use StDev_P and STDEV.P (STDEVP), not StDev_S and STDEV.S (STDEV).

StDev_P returns the __ actual__ std dev of a set of numbers.

Use StDev_S when the set of numbers represents a sampling from a larger population, and we __ want to estimate__ the std dev of

That does not apply here (``choose data with the minimum deviation``)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 07:27 AM

Thanks. It's easy to modify the macro - I'll leave that to the OP.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 07:51 AM - edited Oct 19 2022 08:03 AM

We might note that @Hans Vogelaar's algorithm always finds (one of) the optimal solutions. And it is quick because COMBIN(10,5) is only 252.

In contrast, the equivalent Solver model does not find the best solution if we use the GRG Nonlinear method. The Evolutionary (random) method does stumble upon (one of) the optimal solution. But it takes a long time (although we can change that using Options).

OTOH, the benefits of using Solver is: (a) it does not require VBA coding; and (b) it adapts easily to different subset sizes.

(We could easily modify @Hans Vogelaar's algorithm to adapt to different "population" sizes.)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 08:29 AM - edited Oct 19 2022 09:10 AM

@alvaro037 Here is an alternate solution. I reduce the number of trials because we know std will be lowest for numbers close together so I sort the numbers and then increment through the list:

```
=LET(in,B2:K2,
n,5,
sortin, SORT(TRANSPOSE(in)),
start, SEQUENCE(ROWS(sortin)-n),
stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))),
findLow,MATCH(MIN(stdlist),stdlist,0),
INDEX(sortin,start+findLow-1))
```

see attached

EDIT:

a) please note my SD calculation used STD.P per Joe's suggestion above while Hans' uses STD.S and my solution found a set starting with 8.6-8.9 and Hans' found 8.7-9.0 but the SD is the same in each case.

b) here is an alternate version of the formula that stacks/includes the AVG and SD:

```
=LET(in,B2:K2,
n,5,
sortin, SORT(TRANSPOSE(in)),
start, SEQUENCE(ROWS(sortin)-n),
stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))),
findLow,MATCH(MIN(stdlist),stdlist,0),
ans,INDEX(sortin,start+findLow-1),
out,VSTACK(ans,"Avg",AVERAGE(ans),"SD",STDEV.P(ans)),
out)
```

I prefer to have them separate as in the attached file but that is up to you

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 08:46 AM

Re: Standard Deviation - a manual method

The data was sorted and the end cells were eliminated to reduce the data to five cells.

Only takes two tries...

'---

Nothing Left to Lose

https://1drv.ms/u/s!Au8Lyt79SOuhZw2MCH7_7MuLj04?e=sAwbHU

(*free excel programs*)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 08:56 AM

@mtarler wrote: ``we know std will be lowest for numbers close together``

My version of Excel does not support those features.

Which subset of 5 is chosen with the following set of data:

1.1

1.3

1.5

1.7

1.9

100.1

100.2

100.3

100.4

100.5

The correct subset is 100.1 through 100.5 (sd = 0.141421356237312), not 1.1 through 1.9 (sd = 0.282842712474619).

And just for fun, which subset is chosen if the first 5 numbers are 1.1, 1.2, 1.3, 1.4 and 1.5?

In that case, the sd is the same.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 09:09 AM

my comment was purely to explain why I sort the numbers first and can then increment through that sorted list instead of checking every combination. i.e. only check N-5 (i.e. 5) instead of 252 cases.

As for the answers, it pointed out an error in my formula (i.e. it was offset by 1 and didn't check the first entry correctly) so I edited the above post to have the correction.

That said, my formula finds the first set with the minimum SD while Hans' find a correct set but not necessarily consistent which.

As for the answers, it pointed out an error in my formula (i.e. it was offset by 1 and didn't check the first entry correctly) so I edited the above post to have the correction.

That said, my formula finds the first set with the minimum SD while Hans' find a correct set but not necessarily consistent which.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 10:40 AM

Thanks everyone @Jan Karel Pieterse @Hans Vogelaar @Joe User @mtarler @Nothing_Left_to_Lose

I've attached the data I have below. I like the approaches of @mtarler and I think it would suits best with my data. I'm not familiar in anyway in VBA so I don't know how to replicate this for multiple dataset. The attached file have what I imagine easily useable.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 10:54 AM

I tried to paste formula but I think there is problem because of me using english terms for the functions and don't know the correct translation.

2 changes are

a) change the range for 'in' to be row 3

b) using the 2nd formula above that has both make the last line: TRANSPOSE(out)

2 changes are

a) change the range for 'in' to be row 3

b) using the 2nd formula above that has both make the last line: TRANSPOSE(out)

best response confirmed by
alvaro037* (New Contributor)*

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 11:08 AM

SolutionIn the attached version, I have implemented my VBA algorithm (with StDev_P instead of StDev_S) as a custom function. It is used in an array formula confirmed with Ctrl+Shift+Enter to return the selected values.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 11:45 AM - edited Oct 19 2022 12:08 PM

@mtarler wrote:

```
=LET(in,B2:K2,
n,5,
sortin, SORT(TRANSPOSE(in)),
start, SEQUENCE(ROWS(sortin)-n),
stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,start+i-1)))),
findLow,MATCH(MIN(stdlist),stdlist,0),
ans,INDEX(sortin,start+findLow-1),
out,VSTACK(ans,"Avg",AVERAGE(ans),"SD",STDEV.P(ans)),out)
```

Forgive, but I don't speak the new Excel 365 language.

Please describe in one or two sentences what the code above does.

For example, does it find the __ contiguous__ n sorted data with the smallest std dev?

The operative word is "contiguous". Right?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 12:32 PM

@Joe User sure. and ugh I found another error in going through it because there were 10 items and looking for 5 I was using the n=5 and the 10-5 (i.e. 5) together. I tweaked the code to account for possible other combinations:

```
=LET(in,B3:K3,
n,5,
sortin, SORT(TRANSPOSE(in)),
start, SEQUENCE(ROWS(sortin)-n+1),
stdlist, SCAN(0,start,LAMBDA(p,i,STDEV.P(INDEX(sortin,SEQUENCE(n,,0)+i)))),
findLow,MATCH(MIN(stdlist),stdlist,0),
ans,INDEX(sortin,SEQUENCE(n,,0)+findLow),
out,VSTACK(ans,AVERAGE(ans),STDEV.P(ans)),
TRANSPOSE(out))
```

line 1: define the input range

line 2: define how many items to select

line 3: rearrange the data from in to be sorted in order

line 4: define a sequence 1,2,3,... up to the length of the input minus N so if you have 100 inputs and selecting 5 then you will want to start at 1,2,3,..., 93,94,95,96 and you stop at 96 because after that you won't have enough left to get 5 items

line 5: create a list of standard deviations for each group of n items starting with index 1 and going to the end of the list

line 6: find the location in the list where the MIN value is located

line 7: create a list of the N values used to create that MIN standard deviation value

line 8: stack the list from 7 and add the average and the standard deviation

@alvaro037 please see this formula or the attached file for the corrected formula

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Oct 19 2022 01:11 PM - edited Oct 19 2022 02:18 PM

@mtarler wrote:

``you will want to start at 1,2,3,..., 93,94,95,96 and you stop at 96 because after that you won't have enough left to get 5 items

create a list of standard deviations for each group of n items starting with index 1 and going to the end of the list

find the location in the list where the MIN value is located``

So again, __ in simple terms__, looking at the sorted list of data, you find the 5

Is that right?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content