Forum Discussion

charliek08's avatar
charliek08
Copper Contributor
Feb 16, 2024

Remove Duplicates from Random Sample Prior to Sampling

I am using the sampling tool in the data analysis toolpak add-in to create a random sample. In this example I had a population of 116 and need exactly 29 unique samples chosen from this 116. I do not see any way to stop duplicates until after the sample has been created. When I performed this sample I got 7 duplicates, so my sample size ended up being only 22, which will not suffice. Is there a solution to this problem?

1 Reply

  • Rodrigo_'s avatar
    Rodrigo_
    Steel Contributor
    Try:
    =INDEX(SORTBY(A2:A116, RANDARRAY(ROWS(A2:A116))), SEQUENCE(29))

    A2:A116 is the range of your data, and
    29 is your desired sample size
    RANDARRAY function generate an array of random numbers with the same number of rows as your data.
    SORTBY function to sort your data based on the random numbers generated by RANDARRAY. It will shuffle your data.

Resources