The CLT can be used to justify the normal approximation to the binomial distribution discussed in Chapter 4. Recall that a binomial variable is the number of successes in a binomial experiment consisting of independent success/failure trials with for any particular trial. Define a new by

and define analogously for the other trials. Each indicates whether or not there is a success on the corresponding trial.

Because the trials are independent and is constant from trial to trial, the ’s are iid (a random sample from a Bernoulli distribution). The CLT then implies that if is sufficiently large, both the sum and the average of the ’s have approximately normal distributions. When the ’s are summed, a 1 is added for every that occurs and a 0 for every , so . The sample mean of the ’s is , the sample proportion of successes. That is, both and are approximately normal when is large. The necessary sample size for this approximation depends on the value of : When is close to.5, the distribution of each is reasonably symmetric (see Figure 5.20), whereas the distribution is quite skewed when is near 0 or 1. Using the approximation only if both and ensures that is large enough to overcome any skewness in the underlying Bernoulli distribution.

Figure 5.20 Two Bernoulli distributions: (a) (reasonably symmetric); (b) (very skewed) 0192609f-6f5c-74c9-8588-c1ef28b2184d_38_837_675_566_214_0.jpg

Consider independent Poisson rv’s , each having mean value . It can be shown that has a Poisson distribution with mean value (because in general a sum of independent Poisson rv’s has a Poisson distribution). The CLT then implies that a Poisson rv with sufficiently large has approximately a normal distribution. A common rule of thumb for this is .

Lastly, recall from 4.5 Other Continuous Distributions that has a lognormal distribution if has a normal distribution. Let be a random sample from a distribution for which only positive values are possible . Then if is sufficiently large, the product has approximately a lognormal distribution.

To verify this, note that

Since is a sum of independent and identically distributed rv’s (the ’s), it is approximately normal when is large, so itself has approximately a lognormal distribution. As an example of the applicability of this result, Bury (Statistical Models in Applied Science, Wiley, p. 590) argues that the damage process in plastic flow and crack propagation is a multiplicative process, so that variables such as percentage elongation and rupture strength have approximately lognormal distributions.