The observations in a single sample were denoted in Chapter 1 by . Consider selecting two different samples of size from the same population distribution. The ’s in the second sample will virtually always differ at least a bit from those in the first sample. For example, a first sample of cars of a particular type might result in fuel efficiencies , , , whereas a second sample may give , , and . Before we obtain data, there is uncertainty about the value of each . Because of this uncertainty, before the data becomes available we now regard each observation as a random variable and denote the sample by (uppercase letters for random variables).

This variation in observed values in turn implies that the value of any function of the sample observations

  • such as the sample mean, sample standard deviation, or sample fourth spread
  • also varies from sample to sample.

That is, prior to obtaining , there is uncertainty as to the value of , the value of , and so on.

EXAMPLE 5.20

Defintion (statistic)

A statistic is any quantity whose value can be calculated from sample data. Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore,

  • a statistic is a random variable and will be denoted by an uppercase letter;
  • a lowercase letter is used to represent the calculated or observed value of the statistic.
Link to original

Thus the sample mean, regarded as a statistic (before a sample has been selected or an experiment carried out), is denoted by ; the calculated value of this statistic is . Similarly, represents the sample standard deviation thought of as a statistic, and its computed value is . If samples of two different types of bricks are selected and the individual compressive strengths are denoted by and , respectively, then the statistic , the difference between the two sample mean compressive strengths, is often of great interest.

Any statistic, being a random variable, has a probability distribution. In particular, the sample mean has a probability distribution. Suppose, for example, that components are randomly selected and the number of breakdowns while under warranty is determined for each one. Possible values for the sample mean number of breakdowns are 0 (if ),. 5 (if either and or . The probability distribution of specifies , , and so on, from which other probabilities such as and can be calculated. Similarly, if for a sample of size , the only possible values of the sample variance are 0,12.5, and 50 (which is the case if and can each take on only the values , or 50 ), then the probability distribution of gives , and . The probability distribution of a statistic is sometimes referred to as its sampling distribution to emphasize that it describes how the statistic varies in value across all samples that might be selected.