Generate Random number with conditions

Copper Contributor

Hi all,

I'm trying to generate 400 random numbers given different means according to conditions.

 

For instance, I would like to generate a random number with mean 48 when Time is below 100. Mean 42 when time is between 1000 and 1700. Mean

Schermata 2022-07-20 alle 17.10.40.png

5 Replies

@Nicoletta991 

 

You say you'd like "to generate a random number with mean 48 when Time is below 100. Mean 42 when time is between 1000 and 1700. Mean...."     [the text disappeared after that last "Mean"]

 

I've placed the emphasis in that quote from you on "A random number." That means a single number. A single number doesn't go conceptually with "mean" however. A mean would come from a set of numbers. 

 

And random numbers in Excel are generated between a low number and a high number, not around a "mean"......

 

So why don't we go back to the full context here: what are you actually trying to accomplish with that request? Why differentiate the criteria for the randomness in that way? etc.

 

 

@Nicoletta991  wrote:  ``generate 400 random numbers [....] with mean 48 when Time is below 100. Mean 42 when time is between 1000 and 1700. Mean [sic; incomplete]``

 

I interpret that to mean:  for all the random numbers associated with Time between 1000 and 1700, for example, their mean should be 42.

 

(Note that the column labeled Time appears to contain random numbers.)

 

It is impossible to offer a turnkey solution because of all the details that you omit.  For example:

 

1. What version(s) of Excel do you use?  I am sure a better solution is possible with Office 365 Excel, especially a "beta" version.  I do not have that.

 

2. What should the mean of the numbers be when Time is 100 or more and less than 1000?  Or is "100" a typo?

 

3. What should the mean of the numbers be when Time is more than 1700?

 

4. Do you require an exact mean of 42 when Time is between 1000 and 1700, for example?  Or an approximate mean of 42?  If the latter, with what degree of confidence (e.g. 95%)?

 

5. What is the acceptable range of random numbers in each category?  For example, they must be non-negative (i.e. zero or more)?  Or they must be positive (excludes zero)? They must be integers?  If they can be non-integers, with what precision? Etc, etc, etc.

 

6. Do you require an Excel-only solution?  Or would a VBA solution be acceptable?

 

7.  Etc, etc, etc.  (I am probably forgetting some details.)

 

For starters, if I want to generate 20 random non-negative integers (i.e. zero or more) with a mean of exactly 48, that is the same as saying that I want the 20 integers to sum to 960 (20*48).

 

I would accomplish that with the following paradigm.

 

JoeUser_0-1658335424701.png

Formulas:

B3:  =B2*B1

A6:  =ROUND($B$3*SUM($B$6:B6)/$C$6 - SUM($A$5:A5), 0)

B6:  =RAND()

C6:  =SUM(B6:B25)

C7:  =SUM(A6:A25)

C8:  =AVERAGE(A6:A25)

Copy A6:B6 down the columns

 

That is only one part of the solution, at best.  But I hope it helps.

 

John you have good points. I believe what they are looking for is to generate a column of random numbers that are dependent on the time column so if you group all the generated random numbers that are next to a time <100 those random numbers will have a mean of 42.
That said, you could easily use conditionals and multiply:
=2*RAND()*IFS( [time]<100, 48, AND([time]>1000,[time]<1700), 42, .... )
so basically RAND() will give a number 0 to 1 so multiplying by 2 makes it between 0 and 2 then multiplying by the "mean" you want will make it centered on that intended mean (range 0 to 2xmean) of you could also:
=(RAND()-0.5) * [range] + IFS( [time]<100, 48, AND([time]>1000,[time]<1700), 42, .... )
In this version you can set a [range] so it will be centered around the mean and be +/- that [range] value.
NOTE neither case will GUARANTEE a mean of the desired value just have random numbers that are generate 'centered' on that mean.
Also, this will generate a "non-normal" distribution because it will equally likely to any number in the range and not have a "normal" distribution.
the whole intended distribution needs to be addressed and if the mean MUST be = to the intended mean.

@mtarler wrote:

I believe what they are looking for is to generate a column of random numbers that are dependent on the time column so if you group all the generated random numbers that are next to a time <100 those random numbers will have a mean of 42.

 

And I think you're right, Matt,  with the next batch having a mean of 48...and so forth.

 

A point I didn't make, however--because it all needs better definition (IMHO)--is that a group of random numbers with a mean of 42 just isn't going to be all that much different from a group of random numbers with a mean of 48--there just isn't enough variance between those two "means," UNLESS the range of randomness is greatly restricted and the random numbers aren't integers... so, I repeat, greater definition is required!!

@Nicoletta991  Previously, I wrote:

``For starters, if I want to generate 20 random non-negative integers (i.e. zero or more) with a mean of exactly 48, that is the same as saying that I want the 20 integers to sum to 960 (20*48).  I would accomplish that with the following paradigm.``

 

The following is an application of that paradigm to your data and requirements, as I understand them.  Also see the attached Excel file.  Of course, you should take this with a grain of salt, pending answers to the many questions that I asked earlier.

 

JoeUser_0-1658349297935.png

Key Formulas:

C2:  =ROUND(INDEX($L$2:$L$4, E2) * SUMIFS($F$2:F2, $E$2:E2, E2)

         / INDEX($M$2:$M$4, E2) - SUMIFS($C$1:C1,$E$1:E1, E2), 0)

E2:  =MATCH(B2, $I$2:$I$4)

F2:  =RAND()

K2:  =COUNTIFS($B$2:$B$401, ">=" & I2, $B$2:$B$401, "<" & I3)

K4:  =COUNTIF($B$2:$B$401, ">=" & I4)

L2:  =J2*K2

M2:  =SUMIFS($F$2:$F$401, $B$2:$B$401, ">=" & I2, $B$2:$B$401, "<" & I3)

M4:  =SUMIFS($F$2:$F$401, $B$2:$B$401, ">=" & I4)

Other Formulas:

B12:  =CHOOSE(RANDBETWEEN(1,3), RANDBETWEEN(1,999),

           RANDBETWEEN(1000,1700), RANDBETWEEN(1701,5000))

O2:  =SUMIFS($C$2:$C$401, $B$2:$B$401, ">=" & I2, $B$2:$B$401, "<" & I3)

O4:  =SUMIFS($C$2:$C$401, $B$2:$B$401, ">=" & I4)

P2:  =AVERAGEIFS($C$2:$C$401, $B$2:$B$401, ">=" & I2, $B$2:$B$401, "<" & I3)

P4:  =AVERAGEIFS($C$2:$C$401, $B$2:$B$401, ">=" & I4)

 

For the example, I generate random times starting in B12; and the required mean in J4 is an arbitrary choice.  Of course, you would enter the actual data or substitute the formulas that you already use.

 

The table in columns H:M is a lookup table.  For the time ranges in column I, I assumed that the "100" that you wrote is a typo, and it should be 1000.

 

In column E ("idx"), we determine the table index that corresponds to the time in column B.

 

The table index is used in the formulas in column C to generate the random numbers.  In recent versions of Excel, you could use the LET function to calculate "idx" and assign it to a LET variable.  That would obviate the need for column E.

 

The formulas in columns O:P are just for "proof of concept".  They are not necessary.