Assumptions for the Hypergeometric Distribution:

  1. Finite Population:
    • The population or set to be sampled consists of individuals, objects, or elements.
  2. Characterizaion of Individuals:
    • Each individual can be classified as a success or a failure .
    • There are successes in the population.
  3. Sampling Without Replacement:
    • A sample of individuals is selected without replacement.
    • Each subset of size is equally likely to be chosen.
  • Random Variable of Interest:
    • : the number of successes in the sample.
    • Probability distribution of :
      • Depends on parameters , , and .
      • Expressed as .

EX 3.34 printer problem

Possible Values of :

  1. Upper Limit of :
    • If :
      • The largest possible value is .
    • If :
      • The largest possible value is .
  2. Lower Limit of :
    • If :
      • The smallest possible value is 0 (all sampled may be failures).
    • If :
      • The smallest possible value is .
  3. Restrictions on :
    • The values of must satisfy:

Probability Mass Function (pmf):

  • An argument similar to previous examples leads to the derivation of the pmf of .

hypergeometric distribution

If is the number of ‘s in a completely random sample of size drawn from a population consisting of ‘s and ‘s, then the probability distribution of , called the hypergeometric distribution, is given by

\begin{align} P\left( {X = x}\right) &= h\left( {x;n,M,N}\right) \\ &= \frac{\left( \begin{matrix} M \\ x \end{matrix}\right) \left( \begin{matrix} N - M \\ n - x \end{matrix}\right) }{\left( \begin{matrix} N \\ n \end{matrix}\right) } \end{align} \tag{3.15} $$for $x$ an integer satisfying

In EX 3.34 printer problem,

  • ,
  • so can be obtained by substituting these numbers into Equation (3.15).

EX 3.35 animal population

Remark

Various statistical software packages will easily generate hypergeometric probabilities (tabulation is cumbersome because of the three parameters).

As in the binomial case, there are simple expressions for and for hypergeometric rv’s.

Proposition

The mean and variance of the hypergeometric rv having pmf are

  • The ratio is the proportion of ‘s in the population.
  • Replacing by in and gives

Proposition

  • V\left( X\right) = \left( \frac{N - n}{N - 1}\right) \cdot {np}\left( {1 - p}\right) \tag{3.16}

Comparison of Binomial and Hypergeometric Random Variables:

  1. Means:
    • The means of both the binomial and hypergeometric rv’s are equal:
  2. Variances:
    • The variances differ by a finite population correction factor:
    • This factor is less than 1, indicating that the hypergeometric variable has smaller variance than the binomial variable.
  3. Finite Population Correction Factor:
    • Can also be expressed as:
    • Approximately equal to 1 when is small relative to .

EX 3.36 (EX 3.35 continued)

Rule for Approximating Hypergeometric by Binomial Distribution:

  1. General Guideline:
    • If sampling is without replacement and the ratio is at most 0.05, the binomial distribution can be used to approximate probabilities related to the number of successes in the sample.
  2. More Precise Statement:
    • As both the population size and the number of successes increase, with the ratio approaching a probability :
    • This approximation holds when is small and is not too close to 0 or 1.
  3. Rationale:
    • The rationale for this approximation is that under these conditions, the hypergeometric distribution behaves similarly to the binomial distribution.