6.1 Some General Concepts of Point Estimation

Statistical inference is almost always directed toward drawing some type of conclusion about one or more parameters (population characteristics). To do so requires that an investigator obtain sample data from each of the populations under study. Conclusions can then be based on the computed values of various sample quantities. For example, let $μ$ (a parameter) denote the true average breaking strength of wire connections used in bonding semiconductor wafers. A random sample of $n = 10$ connections might be made, and the breaking strength of each one determined, resulting in observed strengths $x_{1}, x_{2}, \dots, x_{10}$ . The sample mean breaking strength $\overset{x}{ˉ}$ could then be used to draw a conclusion about the value of $μ$ . Similarly, if $σ^{2}$ is the variance of the breaking strength distribution (population variance, another parameter), the value of the sample variance $s^{2}$ can be used to infer something about $σ^{2}$ .

When discussing general concepts and methods of inference, it is convenient to have a generic symbol for the parameter of interest. We will use the Greek letter $θ$ for this purpose. In many investigations, $θ$ will be a population mean $μ$ , a difference $μ_{1} - μ_{2}$ between two population means, or a population proportion of “successes” $p$ . The objective of point estimation is to select a single number, based on sample data, that represents a sensible value for $θ$ . As an example, the parameter of interest might be $μ$ , the true average lifetime of batteries of a certain type. A random sample of $n = 3$ batteries might yield observed lifetimes (hours) $x_{1} = 5.0$ , $x_{2} = 6.4, x_{3} = 5.9$ . The computed value of the sample mean lifetime is $\overset{x}{ˉ} = 5.77$ , and it is reasonable to regard 5.77 as a very plausible value of $μ$ -our “best guess” for the value of $μ$ based on the available sample information.

Suppose we want to estimate a parameter of a single population (e.g., $μ$ or $σ$ ) based on a random sample of size $n$ . Recall from the previous chapter that before data is available, the sample observations must be considered random variables (rv’s) $X_{1}$ , $X_{2}, \dots, X_{n}$ . It follows that any function of the $X_{i}$ ’s-that is, any statistic-such as the sample mean $\overset{ˉ}{X}$ or sample standard deviation $S$ is also a random variable. The same is true if available data consists of more than one sample. For example, we can represent tensile strengths of $m$ type 1 specimens and $n$ type 2 specimens by $X_{1}, \dots, X_{m}$ and $Y_{1}, \dots, Y_{n}$ , respectively. The difference between the two sample mean strengths is $\overset{ˉ}{X} - \overset{ˉ}{Y}$ ; this is the natural statistic for making inferences about $μ_{1} - μ_{2}$ , the difference between the population mean strengths.

DEFINITION

A point estimate of a parameter $θ$ is a single number that can be regarded as a sensible value for $θ$ . It is obtained by selecting a suitable statistic and computing its value from the given sample data. The selected statistic is called the point estimator of $θ$ .

In the foregoing battery example, the estimator used to obtain the point estimate of $μ$ was $\overset{ˉ}{X}$ , and the point estimate of $μ$ was 5.77 . If the three observed lifetimes had instead been $x_{1} = 5.6, x_{2} = 4.5$ , and $x_{3} = 6.1$ , use of the estimator $\overset{ˉ}{X}$ would have resulted in the estimate $\overset{x}{ˉ} = (5.6 + 4.5 + 6.1) /3 = 5.40$ . The symbol $θ$ (“theta hat”) is customarily used to denote both the estimator of $θ$ and the point estimate resulting from a given sample.* Thus $μ = \overset{ˉ}{X}$ is read as “the point estimator of $μ$ is the sample mean $\overset{ˉ}{X}$ .” The statement “the point estimate of $μ$ is 5.77” can be written concisely as $μ = 5.77$ . Notice that in writing $θ = 72.5$ , there is no indication of how this point estimate was obtained (what statistic was used). It is recommended that both the estimator and the resulting estimate be reported.

Following earlier notation, we could use $Θ$ (an uppercase theta) for the estimator, but this is cumbersome to write.

EXAMPLE 6.1 An automobile manufacturer has developed a new type of bumper, which is supposed to absorb impacts with less damage than previous bumpers. The manufacturer has used this bumper in a sequence of 25 controlled crashes against a wall, each at $10 mph$ , using one of its compact car models. Let $X =$ the number of crashes that result in no visible damage to the automobile. The parameter to be estimated is $p =$ the proportion of all such crashes that result in no damage (alternatively, $p = P$ (no damage in a single crash)). If $X$ is observed to be $x = 15$ , the most reasonable estimator and estimate are

estimator p = \frac{X}{n} estimate = \frac{x}{n} = \frac{15}{25} = .60

If for each parameter of interest there were only one reasonable point estimator, there would not be much to point estimation. In most problems, though, there will be more than one reasonable estimator.

EXAMPLE 6.2 Consider the accompanying 20 observations on dielectric breakdown voltage for pieces of epoxy resin first introduced in Exercise 4.89. $24.46 25.61 26.25 26.42 26.66 27.15 27.31 27.54 27.74 27.94$ $27.98 28.04 28.28 28.49 28.50 28.87 29.11 29.13 29.50 30.88$

The pattern in the normal probability plot given there is quite straight, so we now assume that the distribution of breakdown voltage is normal with mean value $μ$ . Because normal distributions are symmetric, $μ$ is also the median lifetime of the distribution. The given observations are then assumed to be the result of a random sample $X_{1}, X_{2}, \dots, X_{20}$ from this normal distribution. Consider the following estimators and resulting estimates for $μ$ :

a. Estimator $= \overset{ˉ}{X}$ , estimate $= \overset{x}{ˉ} = \sum x_{i} / n = 555.86 / 20 = 27.793$

b. Estimator $= X$ , estimate $= x = (27.94 + 27.98) /2 = 27.960$

c. Estimator $= [min (X_{i}) + max (X_{i})] /2 =$ the average of the two extreme lifetimes, estimate $= [min (x_{i}) + max (x_{i})] /2 = (24.46 + 30.88) /2 = 27.670$

d. Estimator $= \overset{ˉ}{X}_{tr (10)}$ , the $10 %$ trimmed mean (discard the smallest and largest $10 %$ of the sample and then average),

estimate = \overset{x}{ˉ}_{tr (10)}

= \frac{555.86 - 24.46 - 25.61 - 29.50 - 30.88}{16}

= 27.838

Each one of the estimators (a)-(d) uses a different measure of the center of the sample to estimate $μ$ . Which of the estimates is closest to the true value? This question cannot be answered without knowing the true value. A question that can be answered is, “Which estimator, when used on other samples of $X_{i}$ ’s, will tend to produce estimates closest to the true value?” We will shortly address this issue.

EXAMPLE 6.3 The article “Is a Normal Distribution the Most Appropriate Statistical Distribution for Volumetric Properties in Asphalt Mixtures?” first cited in Example 4.26, reported the following observations on $X =$ voids filled with asphalt (%) for 52 specimens of a certain type of hot-mix asphalt:

74.33	71.07	73.82	77.42	79.35	82.27	77.75	78.65	77.19
74.69	77.25	74.84	60.90	60.75	74.09	65.36	67.84	69.97
68.83	75.09	62.54	67.47	72.00	66.51	68.21	64.46	64.34
64.93	67.33	66.08	67.31	74.87	69.40	70.83	81.73	82.50
79.87	81.96	79.51	84.12	80.61	79.89	79.70	78.74	77.28
79.97	75.09	74.38	77.67	83.73	80.39	76.90

Let’s estimate the variance $σ^{2}$ of the population distribution. A natural estimator is the sample variance:

σ^{2} = S^{2} = \frac{\sum ( X _{i} - X ˉ ) ^{2}}{n - 1}

Minitab gave the following output from a request to display descriptive statistics:

Variable	Count	Mean	SE Mean	StDev	Variance	Q1	Median	Q3
VFA(B)	52	73.880	0.889	6.413	41.126	67.933	74.855	79.470

Thus the point estimate of the population variance is

σ^{2} = s^{2} = \frac{\sum ( x _{i} - x ˉ ) ^{2}}{52 - 1} = 41.126

[alternatively, the computational formula for the numerator of $s^{2}$ gives

S_{xx} = \sum x_{i}^{2} - (\sum x_{i})^{2} / n = 285, 929.5964 - (3841.78)^{2} / 52 = 2097.4124] .

A point estimate of the population standard deviation is then $σ = s = 41.126 =$ 6.413.

An alternative estimator results from using the divisor $n$ rather than $n - 1$ :

σ^{2} = \frac{\sum ( X _{i} - X ˉ ) ^{2}}{n}, estimate = \frac{2097.4124}{52} = 40.335

We will shortly indicate why many statisticians prefer $S^{2}$ to this latter estimator.

The cited article considered fitting four different distributions to the data: normal, lognormal, two-parameter Weibull, and three-parameter Weibull. Several different techniques were used to conclude that the two-parameter Weibull provided the best fit (a normal probability plot of the data shows some deviation from a linear pattern). From Section 4.5, the variance of a Weibull random variable is

σ^{2} = β^{2} {Γ (1 + 2/ α) - [Γ (1 + 1/ α)]^{2}}

where $α$ and $β$ are the shape and scale parameters of the distribution. The authors of the article used the method of maximum likelihood (see Section 6.2) to estimate these parameters. The resulting estimates were $α = 11.9731, β = 77.0153$ . A sensible estimate of the population variance can now be obtained from substituting the estimates of the two parameters into the expression for $σ^{2}$ ; the result is $σ^{2} = 56.035$ . This latter estimate is obviously quite different from the sample variance. Its validity depends on the population distribution being Weibull, whereas the sample variance is a sensible way to estimate $σ^{2}$ when there is uncertainty as to the specific form of the population distribution.

In the best of all possible worlds, we could find an estimator $θ$ for which $θ = θ$ always. However, $θ$ is a function of the sample $X_{i}$ ’s, so it is a random variable. For some samples, $θ$ will yield a value larger than $θ$ , whereas for other samples $θ$ will underestimate $θ$ . If we write

θ = θ + error of estimation

then an accurate estimator would be one resulting in small estimation errors, so that estimated values will be near the true value.

A sensible way to quantify the idea of $θ$ being close to $θ$ is to consider the squared error $(θ - θ)^{2}$ . For some samples, $θ$ will be quite close to $θ$ and the resulting squared error will be near 0 . Other samples may give values of $θ$ far from $θ$ , corresponding to very large squared errors. An omnibus measure of accuracy is the expected or mean square error MSE $= E [(θ - θ)^{2}]$ . If a first estimator has smaller MSE than does a second, it is natural to say that the first estimator is the better one. However, MSE will generally depend on the value of $θ$ . What often happens is that one estimator will have a smaller MSE for some values of $θ$ and a larger MSE for other values. Finding an estimator with the smallest MSE is typically not possible.

One way out of this dilemma is to restrict attention just to estimators that have some specified desirable property and then find the best estimator in this restricted group. A popular property of this sort in the statistical community is unbiasedness.

6.1.1 Unbiased Estimators

Suppose we have two measuring instruments; one instrument has been accurately calibrated, but the other systematically gives readings larger than the true value being measured. When each instrument is used repeatedly on the same object, because of measurement error, the observed measurements will not be identical. However, the measurements produced by the first instrument will be distributed about the true value in such a way that on average this instrument measures what it purports to measure, so it is called an unbiased instrument. The second instrument yields observations that have a systematic error component or bias. Figure 6.1 shows 10 measurements from both an unbiased and a biased instrument.

019264b3-14d7-793b-ad40-3bd36b82fa05_4_746_1452_897_218_0.jpg

Figure 6.1 Measurements from (a) an unbiased instrument, and (b) a biased instrument

DEFINITION

A point estimator $θ$ is said to be an unbiased estimator of $θ$ if $E (θ) = θ$ for every possible value of $θ$ . If $θ$ is not unbiased, the difference $E (θ) - θ$ is called the bias of $θ$ .

That is, $θ$ is unbiased if its probability (i.e., sampling) distribution is always “centered” at the true value of the parameter. Suppose $θ$ is an unbiased estimator; then if $θ = 100$ , the $θ$ sampling distribution is centered at 100 ; if $θ = 27.5$ , then the $θ$ sampling distribution is centered at 27.5 , and so on. Figure 6.2 pictures the distributions of several biased and unbiased estimators. Note that “centered” here means that the expected value, not the median, of the distribution of $θ$ is equal to $θ$ .

019264b3-14d7-793b-ad40-3bd36b82fa05_5_591_188_981_278_0.jpg

Figure 6.2 The pdf’s of a biased estimator $θ_{1}$ and an unbiased estimator $θ_{2}$ for a parameter $θ$

It may seem as though it is necessary to know the value of $θ$ (in which case estimation is unnecessary) to see whether $θ$ is unbiased. This is not usually the case, though, because unbiasedness is a general property of the estimator’s sampling distribution-where it is centered-which is typically not dependent on any particular parameter value.

In Example 6.1, the sample proportion $X / n$ was used as an estimator of $p$ , where $X$ , the number of sample successes, had a binomial distribution with parameters $n$ and $p$ . Thus

E (p) = E (\frac{X}{n}) = \frac{1}{n} E (X) = \frac{1}{n} (n p) = p

PROPOSITION When $X$ is a binomial rv with parameters $n$ and $p$ , the sample proportion $p = X / n$ is an unbiased estimator of $p$ .

No matter what the true value of $p$ is, the distribution of the estimator $p$ will be centered at the true value.

EXAMPLE 6.4 Suppose that $X$ , the reaction time to a certain stimulus, has a uniform distribution on the interval from 0 to an unknown upper limit $θ$ (so the density function of $X$ is rectangular in shape with height $1/ θ$ for $0 \leq x \leq θ$ ). It is desired to estimate $θ$ on the basis of a random sample $X_{1}, X_{2}, \dots, X_{n}$ of reaction times. Since $θ$ is the largest possible time in the entire population of reaction times, consider as a first estimator the largest sample reaction time: $θ_{1} = max (X_{1}, \dots, X_{n})$ . If $n = 5$ and $x_{1} = 4.2, x_{2} = 1.7$ , $x_{3} = 2.4, x_{4} = 3.9$ , and $x_{5} = 1.3$ , the point estimate of $θ$ is $θ_{1} = max (4.2, 1.7, 2.4$ , $3.9, 1.3) = 4.2$ .

Unbiasedness implies that some samples will yield estimates that exceed $θ$ and other samples will yield estimates smaller than $θ$ -otherwise $θ$ could not possibly be the center (balance point) of $θ_{1}$ ’s distribution. However, our proposed estimator will never overestimate $θ$ (the largest sample value cannot exceed the largest population value) and will underestimate $θ$ unless the largest sample value equals $θ$ . This intuitive argument shows that $θ_{1}$ is a biased estimator. More precisely, it can be shown (see Exercise 32) that

E (θ_{1}) = \frac{n}{n + 1} \cdot θ < θ (since \frac{n}{n + 1} < 1)

The bias of $θ_{1}$ is given by $n θ / (n + 1) - θ = - θ / (n + 1)$ , which approaches 0 as $n$ gets large.

It is easy to modify $θ_{1}$ to obtain an unbiased estimator of $θ$ . Consider the estimator

θ_{2} = \frac{n + 1}{n} \cdot max (X_{1}, \dots, X_{n})

Using this estimator on the data gives the estimate $(6/5) (4.2) = 5.04$ . The fact that $(n + 1) / n > 1$ implies that $θ_{2}$ will overestimate $θ$ for some samples and underestimate it for others. The mean value of this estimator is

E (θ_{2}) = E [\frac{n + 1}{n} max (X_{1}, \dots, X_{n})] = \frac{n + 1}{n} \cdot E [max (X_{1}, \dots, X_{n})]

= \frac{n + 1}{n} \cdot \frac{n}{n + 1} θ = θ

If $θ_{2}$ is used repeatedly on different samples to estimate $θ$ , some estimates will be too large and others will be too small, but in the long run there will be no systematic tendency to underestimate or overestimate $θ$ .

Principle of Unbiased Estimation

When choosing among several different estimators of $θ$ , select one that is unbiased.

According to this principle, the unbiased estimator $θ_{2}$ in Example 6.4 should be preferred to the biased estimator $θ_{1}$ . Consider now the problem of estimating $σ^{2}$ .

PROPOSITION

Let $X_{1}, X_{2}, \dots, X_{n}$ be a random sample from a distribution with mean $μ$ and variance $σ^{2}$ . Then the estimator

σ^{2} = S^{2} = \frac{\sum ( X _{i} - X ˉ ) ^{2}}{n - 1}

is unbiased for estimating $σ^{2}$ .

Proof For any $rv Y, V (Y) = E (Y^{2}) - [E (Y)]^{2}$ , so $E (Y^{2}) = V (Y) + [E (Y)]^{2}$ . Applying this to

S^{2} = \frac{1}{n - 1} [\sum X_{i}^{2} - \frac{( \sum X _{i} ) ^{2}}{n}]

gives

E (S^{2}) = \frac{1}{n - 1} {\sum E (X_{i}^{2}) - \frac{1}{n} E [(\sum X_{i})^{2}]}

= \frac{1}{n - 1} {\sum (σ^{2} + μ^{2}) - \frac{1}{n} {V (\sum X_{i}) + [E (\sum X_{i})]^{2}}}

= \frac{1}{n - 1} {n σ^{2} + n μ^{2} - \frac{1}{n} n σ^{2} - \frac{1}{n} (n μ)^{2}}

= \frac{1}{n - 1} {n σ^{2} - σ^{2}} = σ^{2} (as desired)

The estimator that uses divisor $n$ can be expressed as $(n - 1) S^{2} / n$ , so

E [\frac{( n - 1 ) S ^{2}}{n}] = \frac{n - 1}{n} E (S^{2}) = \frac{n - 1}{n} σ^{2}

This estimator is therefore not unbiased. The bias is $(n - 1) σ^{2} / n - σ^{2} = - σ^{2} / n$ . Because the bias is negative, the estimator with divisor $n$ tends to underestimate $σ^{2}$ , and this is why the divisor $n - 1$ is preferred by many statisticians (though when $n$ is large, the bias is small and there is little difference between the two).

Unfortunately, the fact that $S^{2}$ is unbiased for estimating $σ^{2}$ does not imply that $S$ is unbiased for estimating $σ$ . Taking the square root invalidates the property of unbiasedness (the expected value of the square root is not the square root of the expected value). Fortunately, the bias of $S$ is small unless $n$ is quite small. There are other good reasons to use $S$ as an estimator, especially when the population distribution is normal. These will become more apparent when we discuss confidence intervals and hypothesis testing in the next several chapters.

In Example 6.2, we proposed several different estimators for the mean $μ$ of a normal distribution. If there were a unique unbiased estimator for $μ$ , the estimation problem would be resolved by using that estimator. Unfortunately, this is not the case.

PROPOSITION

If $X_{1}, X_{2}, \dots, X_{n}$ is a random sample from a distribution with mean $μ$ , then $\overset{ˉ}{X}$ is an unbiased estimator of $μ$ . If in addition the distribution is continuous and symmetric, then $X$ and any trimmed mean are also unbiased estimators of $μ$ . The fact that $\overset{ˉ}{X}$ is unbiased is just a restatement of one of our rules of expected value: $E (\overset{ˉ}{X}) = μ$ for every possible value of $μ$ (for discrete as well as continuous distributions). The unbiasedness of the other estimators is more difficult to verify.

The next example introduces another situation in which there are several unbiased estimators for a particular parameter.

EXAMPLE 6.5 Under certain circumstances organic contaminants adhere readily to wafer surfaces and cause deterioration in semiconductor manufacturing devices. The article “Ceramic Chemical Filter for Removal of Organic Contaminants” (J. of the Institute of Envir. Sciences and Tech., 2003: 59-65) discussed a recently developed alternative to conventional charcoal filters for removing organic airborne molecular contamination in cleanroom applications. One aspect of the investigation of filter performance involved studying how contaminant concentration in air related to concentration on a wafer surface after prolonged exposure. Consider the following representative data on $x =$ DBP concentration in air and $y =$ DBP concentration on a wafer surface after 4-hour exposure (both in $μ g / m^{3}$ , where $DBP =$ dibutyl phthalate).

Obs. $i$ :	1	2	3	4	5	6
$x$ :	.8	1.3	1.5	3.0	11.6	26.6
$y$ :	.6	1.1	4.5	3.5	14.4	29.1

The authors comment that “DBP adhesion on the wafer surface was roughly proportional to the DBP concentration in air.” Figure 6.3 shows a plot of $y$ versus $x$ -i.e., of the $(x, y)$ pairs.

019264b3-14d7-793b-ad40-3bd36b82fa05_8_707_181_996_547_0.jpg

If $y$ were exactly proportional to $x$ , then $y = β x$ for some value $β$ , which says that the $(x, y)$ points in the plot would lie exactly on a straight line with slope $β$ passing through $(0, 0)$ . But this is only approximately the case. So we now assume that for any fixed $x$ , wafer DBP is a random variable $Y$ having mean value $β x$ . That is, we postulate that the mean value of $Y$ is related to $x$ by a line passing through $(0, 0)$ but that the observed value of $Y$ will typically deviate from this line (this is referred to in the statistical literature as “regression through the origin”).

Consider the following three estimators for the slope parameter $β$ :

\text{#1:}\widehat{\beta } = \frac{1}{n}\sum \frac{{Y}_{i}}{{x}_{i}}\;\# 2 : \widehat{\beta } = \frac{\sum {Y}_{i}}{\sum {x}_{i}}\;\# 3 : \widehat{\beta } = \frac{\sum {x}_{i}{Y}_{i}}{\sum {x}_{i}^{2}}

The resulting estimates based on the given data are 1.3497, 1.1875, and 1.1222, respectively. So the estimate definitely depends on which estimator is used. If one of these three estimators were unbiased and the other two were biased, there would be a good case for using the unbiased one. But all three are unbiased; the argument relies on the fact that each one is a linear function of the $Y_{i}$ ’s (we are assuming here that the $x_{i}$ ’s are fixed, not random):

E (\frac{1}{n} \sum \frac{Y _{i}}{x _{i}}) = \frac{1}{n} \sum \frac{E ( Y _{i} )}{x _{i}} = \frac{1}{n} \sum \frac{β x _{i}}{x _{i}} = \frac{1}{n} \sum β = \frac{n β}{n} = β

E (\frac{\sum Y _{i}}{\sum x _{i}}) = \frac{1}{\sum x _{i}} E (\sum Y_{i}) = \frac{1}{\sum x _{i}} (\sum β x_{i}) = \frac{1}{\sum x _{i}} β (\sum x_{i}) = β

E (\frac{\sum x _{i} Y _{i}}{\sum x _{i}^{2}}) = \frac{1}{\sum x _{i}^{2}} E (\sum x_{i} Y_{i}) = \frac{1}{\sum x _{i}^{2}} (\sum x_{i} β x_{i}) = \frac{1}{\sum x _{i}^{2}} β (\sum x_{i}^{2}) = β

In both the foregoing example and the situation involving estimating a normal population mean, the principle of unbiasedness (preferring an unbiased estimator to a biased one) cannot be invoked to select an estimator. What we now need is a criterion for choosing among unbiased estimators.

6.1.2 Estimators with Minimum Variance

Suppose $θ_{1}$ and $θ_{2}$ are two estimators of $θ$ that are both unbiased. Then, although the distribution of each estimator is centered at the true value of $θ$ , the spreads of the distributions about the true value may be different.

Principle of Minimum Variance Unbiased Estimation

Among all estimators of $θ$ that are unbiased, choose the one that has minimum variance. The resulting $θ$ is called the minimum variance unbiased estimator (MVUE) of $θ$ .

Figure 6.4(a) shows distributions of two different unbiased estimators. Use of the estimator with the more concentrated distribution is more likely than the other one to result in an estimate closer to $θ$ . Figure 6.4(b) displays estimates from the two estimators based on 10 different samples. The MVUE is, in a certain sense, the most likely among all unbiased estimators to produce an estimate close to the true $θ$ .

019264b3-14d7-793b-ad40-3bd36b82fa05_9_280_749_1358_442_0.jpg

Figure 6.4 (a) Distributions of two unbiased estimators (b) Estimates based on 10 different samples

In Example 6.5, suppose each $Y_{i}$ is normally distributed with mean $β x_{i}$ and variance $σ^{2}$ (the assumption of constant variance). Then it can be shown that the third estimator $β = \sum x_{i} Y_{i} / \sum x_{i}^{2}$ not only has smaller variance than either of the other two unbiased estimators, but in fact is the MVUE-it has smaller variance than any other unbiased estimator of $β$ .

EXAMPLE 6.6 We argued in Example 6.4 that when $X_{1}, \dots, X_{n}$ is a random sample from a uniform distribution on $[0, θ]$ , the estimator

θ_{1} = \frac{n + 1}{n} \cdot max (X_{1}, \dots, X_{n})

is unbiased for $θ$ (we previously denoted this estimator by $θ_{2}$ ). This is not the only unbiased estimator of $θ$ . The expected value of a uniformly distributed rv is just the midpoint of the interval of positive density, so $E (X_{i}) = θ /2$ . This implies that $E (\overset{ˉ}{X}) =$ $θ /2$ , from which $E (2 \overset{ˉ}{X}) = θ$ . That is, the estimator $θ_{2} = 2 \overset{ˉ}{X}$ is unbiased for $θ$ .

If $X$ is uniformly distributed on the interval from $A$ to $B$ , then $V (X) =$ $σ^{2} = (B - A)^{2} / 12$ . Thus, in our situation, $V (X_{i}) = θ^{2} / 12, V (\overset{ˉ}{X}) = σ^{2} / n = θ^{2} / (12 n)$ , and $V (θ_{2}) = V (2 \overset{ˉ}{X}) = 4 V (\overset{ˉ}{X}) = θ^{2} / (3 n)$ . The results of Exercise 32 can be used to show that $V (θ_{1}) = θ^{2} / [n (n + 2)]$ . The estimator $θ_{1}$ has smaller variance than does $θ_{2}$ if $3 n < n (n + 2)$ -that is, if $0 < n^{2} - n = n (n - 1)$ . As long as $n > 1$ , $V (θ_{1}) < V (θ_{2})$ , so $θ_{1}$ is a better estimator than $θ_{2}$ . More advanced methods can be used to show that $θ_{1}$ is the MVUE of $θ$ -every other unbiased estimator of $θ$ has variance that exceeds $θ^{2} / [n (n + 2)]$ .

One of the triumphs of mathematical statistics has been the development of methodology for identifying the MVUE in a wide variety of situations. The most important result of this type for our purposes concerns estimating the mean $μ$ of a normal distribution.

THEOREM

Let $X_{1}, \dots, X_{n}$ be a random sample from a normal distribution with parameters $μ$ and $σ$ . Then the estimator $μ = \overset{ˉ}{X}$ is the MVUE for $μ$ .

Whenever we are convinced that the population being sampled is normal, the theorem says that $\overset{x}{ˉ}$ should be used to estimate $μ$ . In Example 6.2, then, our estimate would be $\overset{x}{ˉ} = 27.793$ .

In some situations, it is possible to obtain an estimator with small bias that would be preferred to the best unbiased estimator. This is illustrated in Figure 6.5. However, MVUEs are often easier to obtain than the type of biased estimator whose distribution is pictured.

019264b3-14d7-793b-ad40-3bd36b82fa05_10_909_865_561_245_0.jpg

Figure 6.5 A biased estimator that is preferable to the MVUE

6.1.3 Some Complications

The last theorem does not say that in estimating a population mean $μ$ , the estimator $\overset{ˉ}{X}$ should be used irrespective of the distribution being sampled.

EXAMPLE 6.7 Suppose we wish to estimate the thermal conductivity $μ$ of a certain material. Using standard measurement techniques, we will obtain a random sample $X_{1}, \dots, X_{n}$ of $n$ thermal conductivity measurements. Let’s assume that the population distribution is a member of one of the following three families:

f (x) = \frac{1}{2 π σ ^{2}} e^{- (x - μ)^{2} / (2 σ^{2})} - \infty < x < \infty (6.1)

f (x) = \frac{1}{π β [ 1 + ( ( x - μ ) / β ) ^{2} ]} - \infty < x < \infty (6.2)

f (x) = {\frac{1}{2 c} 0 μ - c \leq x \leq μ + c otherwise (6.3)

The pdf (6.1) is the normal distribution, (6.2) is called the Cauchy distribution, and (6.3) is a uniform distribution. All three distributions are symmetric about $μ$ . The Cauchy density curve is bell-shaped but with much heavier tails (more probability farther out) than the normal curve. In fact, the tails are so heavy that the mean value does not exist, though $μ$ is still the median and a location parameter for the distribution. The uniform distribution has no tails. The four estimators for $μ$ considered earlier are $\overset{ˉ}{X}, X, \overset{ˉ}{X}_{e}$ (the average of the two extreme observations), and $\overset{ˉ}{X}_{tr (10)}$ , a trimmed mean.

The very important moral here is that the best estimator for $μ$ depends crucially on which distribution is being sampled. In particular,

If the random sample comes from a normal distribution, then $\overset{ˉ}{X}$ is the best of the four estimators, since it has minimum variance among all unbiased estimators.
If the random sample comes from a Cauchy distribution, then $\overset{ˉ}{X}$ and $\overset{ˉ}{X}_{e}$ are terrible estimators for $μ$ , whereas $X$ is quite good (the MVUE is not known); $\overset{ˉ}{X}$ is bad because it is very sensitive to outlying observations, and the heavy tails of the Cauchy distribution make a few such observations likely to appear in any sample.
If the underlying distribution is uniform, the best estimator is $\overset{ˉ}{X}_{e}$ ; this estimator is greatly influenced by outlying observations, but the lack of tails makes such observations impossible.
The trimmed mean is best in none of these three situations but works reasonably well in all three. That is, $\overset{ˉ}{X}_{tr (10)}$ does not suffer too much in comparison with the best procedure in any of the three situations.

More generally, recent research in statistics has established that when estimating a point of symmetry $μ$ of a continuous probability distribution, a trimmed mean with trimming proportion $10 %$ or $20 %$ (from each end of the sample) produces reasonably behaved estimates over a very wide range of possible models. For this reason, a trimmed mean with small trimming percentage is said to be a robust estimator.

In some situations, the choice is not between two different estimators constructed from the same sample, but instead between estimators based on two different experiments.

EXAMPLE 6.8 Suppose a certain type of component has a lifetime distribution that is exponential with parameter $λ$ so that expected lifetime is $μ = 1/ λ$ . A sample of $n$ such components is selected, and each is put into operation. If the experiment is continued until all $n$ lifetimes, $X_{1}, \dots, X_{n}$ , have been observed, then $\overset{ˉ}{X}$ is an unbiased estimator of $μ$ .

In some experiments, though, the components are left in operation only until the time of the $r$ th failure, where $r < n$ . This procedure is referred to as censoring. Let $Y_{1}$ denote the time of the first failure (the minimum lifetime among the $n$ components), $Y_{2}$ denote the time at which the second failure occurs (the second smallest lifetime), and so on. Since the experiment terminates at time $Y_{r}$ , the total accumulated lifetime at termination is

T_{r} = i = 1 \sum r Y_{i} + (n - r) Y_{r}

We now demonstrate that $μ = T_{r} / r$ is an unbiased estimator for $μ$ . To do so, we need two properties of exponential variables:

The memoryless property (see Section 4.4), which says that at any time point, remaining lifetime has the same exponential distribution as original lifetime.
When $X_{1}, \dots, X_{k}$ are independent, each exponentially distributed with parameter $λ, min (X_{1}, \dots, X_{k})$ , is exponential with parameter $kλ$ .

Since all $n$ components last until $Y_{1}, n - 1$ last an additional $Y_{2} - Y_{1}, n - 2$ an additional $Y_{3} - Y_{2}$ amount of time, and so on, another expression for $T_{r}$ is

T_{r} = n Y_{1} + (n - 1) (Y_{2} - Y_{1}) + (n - 2) (Y_{3} - Y_{2}) + \dots

+ (n - r + 1) (Y_{r} - Y_{r - 1})

But $Y_{1}$ is the minimum of $n$ exponential variables, so $E (Y_{1}) = 1/ (nλ)$ . Similarly, $Y_{2} - Y_{1}$ is the smallest of the $n - 1$ remaining lifetimes, each exponential with

parameter $λ$ (by the memoryless property), so $E (Y_{2} - Y_{1}) = 1/ [(n - 1) λ]$ . Continuing, $E (Y_{i + 1} - Y_{i}) = 1/ [(n - i) λ]$ , so

E (T_{r}) = n E (Y_{1}) + (n - 1) E (Y_{2} - Y_{1}) + \dots + (n - r + 1) E (Y_{r} - Y_{r - 1})

= n \cdot \frac{1}{nλ} + (n - 1) \cdot \frac{1}{( n - 1 ) λ} + \dots + (n - r + 1) \cdot \frac{1}{( n - r + 1 ) λ}

= \frac{r}{λ}

Therefore, $E (T_{r} / r) = (1/ r) E (T_{r}) = (1/ r) \cdot (r / λ) = 1/ λ = μ$ as claimed.

As an example, suppose 20 components are tested and $r = 10$ . Then if the first ten failure times are $11, 15, 29, 33, 35, 40, 47, 55, 58$ , and 72, the estimate of $μ$ is

μ = \frac{11 + 15 + \dots + 72 + ( 10 ) ( 72 )}{10} = 111.5

The advantage of the experiment with censoring is that it terminates more quickly than the uncensored experiment. However, it can be shown that $V (T_{r} / r) = 1/ (λ^{2} r)$ , which is larger than $1/ (λ^{2} n)$ , the variance of $\overset{ˉ}{X}$ in the uncensored experiment.

6.1.4 Reporting a Point Estimate: The Standard Error

Besides reporting the value of a point estimate, some indication of its precision should be given. The usual measure of precision is the standard error of the estimator used.

The standard error of an estimator $θ$ is its standard deviation $σ_{θ} = V (θ)$ . It is the magnitude of a typical or representative deviation between an estimate and the value of $θ$ . If the standard error itself involves unknown parameters whose values can be estimated, substitution of these estimates into $σ_{θ}$ yields the estimated standard error (estimated standard deviation) of the estimator. The estimated standard error can be denoted either by $σ_{θ}$ (the over $σ$ emphasizes that $σ_{θ}$ is being estimated) or by $s_{θ}$ .

DEFINITION

EXAMPLE 6.9

(Example 6.2

continued)

Assuming that breakdown voltage is normally distributed, $μ = \overset{ˉ}{X}$ is the best estimator of $μ$ . If the value of $σ$ is known to be 1.5, the standard error of $\overset{ˉ}{X}$ is $σ_{\overset{ˉ}{X}} = σ / n = 1.5 / 20 = .335$ . If, as is usually the case, the value of $σ$ is unknown, the estimate $σ = s = 1.462$ is substituted into $σ_{\overset{ˉ}{X}}$ to obtain the estimated standard error $σ_{\overset{ˉ}{X}} = s_{\overset{ˉ}{X}} = s / n = 1.462 / 20 = .327$ .

(Example 6.1

continued)

EXAMPLE 6.10 The standard error of $p = X / n$ is

σ_{p} = V (X / n) = \frac{V ( X )}{n ^{2}} = \frac{n pq}{n ^{2}} = \frac{pq}{n}

Since $p$ and $q = 1 - p$ are unknown (else why estimate?), we substitute $p = x / n$ and $q = 1 - x / n$ into $σ_{p}$ , yielding the estimated standard error $σ_{p} = p q / n =$ $(.6) (.4) / 25 = .098$ . Alternatively, since the largest value of $pq$ is attained when $p = q = .5$ , an upper bound on the standard error is $1/ (4 n) = .10$ .

When the point estimator $θ$ has approximately a normal distribution, which will often be the case when $n$ is large, then we can be reasonably confident that the true value of $θ$ lies within approximately 2 standard errors (standard deviations) of $θ$ . Thus if a sample of $n = 36$ component lifetimes gives $μ = \overset{x}{ˉ} = 28.50$ and $s = 3.60$ , then $s / n = .60$ , so within 2 estimated standard errors, $μ$ translates to the interval $28.50 \pm (2) (.60) = (27.30, 29.70)$ .

If $θ$ is not necessarily approximately normal but is unbiased, then it can be shown that the estimate will deviate from $θ$ by as much as 4 standard errors at most $6%$ of the time. We would then expect the true value to lie within 4 standard errors of $θ$ (and this is a very conservative statement, since it applies to any unbiased $θ$ ). Summarizing, the standard error tells us roughly within what distance of $θ$ we can expect the true value of $θ$ to lie.

The form of the estimator $θ$ may be sufficiently complicated so that standard statistical theory cannot be applied to obtain an expression for $σ_{θ}$ . This is true, for example, in the case $θ = σ, θ = S$ ; the standard deviation of the statistic $S, σ_{S}$ , cannot in general be determined. In recent years, a new computer-intensive method called the bootstrap has been introduced to address this problem. Suppose that the population pdf is $f (x; θ)$ , a member of a particular parametric family, and that data $x_{1}, x_{2}, \dots, x_{n}$ gives $θ = 21.7$ . We now use statistical software to obtain “bootstrap samples” from the pdf $f (x; 21.7)$ , and for each sample calculate a “bootstrap estimate” $θ^{*}$ :

First bootstrap sample: $x_{1}^{*}, x_{2}^{*}, \dots, x_{n}^{*}$ ; estimate $= θ_{1}^{*}$

Second bootstrap sample: $x_{1}^{*}, x_{2}^{*}, \dots, x_{n}^{*}$ ; estimate $= θ_{2}^{*}$

Bth bootstrap sample: $x_{1}^{*}, x_{2}^{*}, \dots, x_{n}^{*}$ ; estimate $= θ_{B}^{*}$

$B = 100$ or 200 is often used. Now let $\overset{ˉ}{θ}^{*} = \sum θ_{i}^{*} / B$ , the sample mean of the bootstrap estimates. The bootstrap estimate of $θ$ ’s standard error is now just the sample standard deviation of the $θ_{i}^{*}$ ’s:

s_{θ} = \frac{1}{B - 1} \sum (θ_{i}^{*} - \overset{ˉ}{θ}^{*})^{2}

(In the bootstrap literature, $B$ is often used in place of $B - 1$ ; for typical values of $B$ , there is usually little difference between the resulting estimates.)

EXAMPLE 6.11 A theoretical model suggests that $X$ , the time to breakdown of an insulating fluid between electrodes at a particular voltage, has $f (x; λ) = λ e^{- λ x}$ , an exponential distribution. A random sample of $n = 10$ breakdown times (min) gives the following data:

$41.53 18.73 2.99 30.34 12.33 117.52 73.02 223.63 4.00 26.78$

Since $E (X) = 1/ λ, E (\overset{ˉ}{X}) = 1/ λ$ , so a reasonable estimate of $λ$ is $λ = 1/ \overset{x}{ˉ} = 1/ 55.087 =$ .018153. We then used a statistical computer package to obtain $B = 100$ bootstrap samples, each of size 10, from $f (x; .018153)$ . The first such sample was $41.00, 109.70, 16.78, 6.31, 6.76, 5.62, 60.96, 78.81, 192.25, 27.61$ , from which $\sum x_{i}^{*} = 545.8$ and $λ_{1}^{*} = 1/ 54.58 = .01832$ . The average of the 100 bootstrap estimates is $\overset{ˉ}{λ}^{*} = .02153$ , and the sample standard deviation of these 100 estimates is $s_{λ} = .0091$ , the bootstrap estimate of $λ$ ’s standard error. A histogram of the $100 λ_{i}^{*}$ ’s was somewhat positively skewed, suggesting that the sampling distribution of $λ$ also has this property.

Sometimes an investigator wishes to estimate a population characteristic without assuming that the population distribution belongs to a particular parametric family. An instance of this occurred in Example 6.7, where a $10 %$ trimmed mean was proposed

for estimating a symmetric population distribution’s center $θ$ . The data of Example 6.2 gave $θ = \overset{x}{ˉ}_{tr (10)} = 27.838$ , but now there is no assumed $f (x; θ)$ , so how can we obtain a bootstrap sample? The answer is to regard the sample itself as constituting the population (the $n = 20$ observations in Example 6.2) and take $B$ different samples, each of size $n$ , with replacement from this population. Several of the books listed in the chapter bibliography provide more information about bootstrapping.

EXERCISES Section 6.1 (1-19)

The accompanying data on flexural strength (MPa) for concrete beams of a certain type was introduced in Example 1.2.

5.9	7.2	7.3	6.3	8.1	6.8	7.0
7.6	6.8	6.5	7.0	6.3	7.9	9.0
8.2	8.7	7.8	9.7	7.4	7.7	9.7
7.8	7.7	11.6	11.3	11.8	10.7

a. Calculate a point estimate of the mean value of strength for the conceptual population of all beams manufactured in this fashion, and state which estimator you used. [Hint: $\sum x_{i} = 219.8$ .]

b. Calculate a point estimate of the strength value that separates the weakest $50 %$ of all such beams from the strongest $50 %$ , and state which estimator you used.

c. Calculate and interpret a point estimate of the population standard deviation $σ$ . Which estimator did you use? [Hint: $\sum x_{i}^{2} = 1860.94$ .]

d. Calculate a point estimate of the proportion of all such beams whose flexural strength exceeds $10 MPa$ . [Hint: Think of an observation as a “success” if it exceeds 10.]

e. Calculate a point estimate of the population coefficient of variation $σ / μ$ , and state which estimator you used.

The National Health and Nutrition Examination Survey (NHANES) collects demographic, socioeconomic, dietary, and health-related information on an annual basis. Here is a sample of 20 observations on HDL cholesterol level (mg/dl) obtained from the 2009- 2010 survey (HDL is “good” cholesterol; the higher its value, the lower the risk for heart disease):

35	49	52	54	65	51	51
47	86	36	46	33	39	45
39	63	95	35	30	48

a. Calculate a point estimate of the population mean HDL cholesterol level.

b. Making no assumptions about the shape of the population distribution, calculate a point estimate of the value that separates the largest $50 %$ of HDL levels from the smallest $50 %$ .

c. Calculate a point estimate of the population standard deviation.

d. An HDL level of at least 60 is considered desirable as it corresponds to a significantly lower risk of heart disease. Making no assumptions about the shape of the population distribution, estimate the proportion $p$ of the population having an HDL level of at least 60.

Consider the following sample of observations on coating thickness for low-viscosity paint (“Achieving a Target Value for a Manufacturing Process: A Case Study,” J. of Quality Technology, 1992: 22-26): $.83 .88 .88 1.04 1.09 1.12 1.29 1.31$ $1.48 1.49 1.59 1.62 1.65 1.71 1.76 1.83$

Assume that the distribution of coating thickness is normal (a normal probability plot strongly supports this assumption).

a. Calculate a point estimate of the mean value of coating thickness, and state which estimator you used.

b. Calculate a point estimate of the median of the coating thickness distribution, and state which estimator you used.

c. Calculate a point estimate of the value that separates the largest $10 %$ of all values in the thickness distribution from the remaining $90 %$ , and state which estimator you used. [Hint: Express what you are trying to estimate in terms of $μ$ and $σ$ .]

d. Estimate $P (X < 1.5)$ , i.e., the proportion of all thickness values less than 1.5. [Hint: If you knew the values of $μ$ and $σ$ , you could calculate this probability. These values are not available, but they can be estimated.]

e. What is the estimated standard error of the estimator that you used in part (b)?

The article from which the data in Exercise 1 was extracted also gave the accompanying strength observations for cylinders: $6.1 5.8 7.8 7.1 7.2 9.2 6.6 8.3 7.0 8.3$ $7.8 8.1 7.4 8.5 8.9 9.8 9.7 14.1 12.6 11.2$

Prior to obtaining data, denote the beam strengths by $X_{1}, \dots, X_{m}$ and the cylinder strengths by $Y_{1}, \dots, Y_{n}$ . Suppose that the $X_{i}$ ’s constitute a random sample from a distribution with mean $μ_{1}$ and standard deviation $σ_{1}$ and that the $Y_{i}$ ’s form a random sample (independent of the $X_{i}$ ’s) from another distribution with mean $μ_{2}$ and standard deviation $σ_{2}$ .

a. Use rules of expected value to show that $\overset{ˉ}{X} - \overset{ˉ}{Y}$ is an unbiased estimator of $μ_{1} - μ_{2}$ . Calculate the estimate for the given data.

b. Use rules of variance from Chapter 5 to obtain an expression for the variance and standard deviation (standard error) of the estimator in part (a), and then compute the estimated standard error.

c. Calculate a point estimate of the ratio $σ_{1} / σ_{2}$ of the two standard deviations.

d. Suppose a single beam and a single cylinder are randomly selected. Calculate a point estimate of the variance of the difference $X - Y$ between beam strength and cylinder strength.

As an example of a situation in which several different statistics could reasonably be used to calculate a point estimate, consider a population of $N$ invoices. Associated with each invoice is its “book value,” the recorded amount of that invoice. Let $T$ denote the total book value, a known amount. Some of these book values are erroneous. An audit will be carried out by randomly selecting $n$ invoices and determining the audited (correct) value for each one. Suppose that the sample gives the following results (in dollars).

Invoice

	1	2	3	4	5
Book value	300	720	526	200	127
Audited value	300	520	526	200	157
Error	0	200	0	0	-30

Let

$\overset{ˉ}{Y} =$ sample mean book value

$\overset{ˉ}{X} =$ sample mean audited value

$\overset{ˉ}{D} =$ sample mean error

Propose three different statistics for estimating the total audited (i.e., correct) value-one involving just $N$ and $\overset{ˉ}{X}$ , another involving $T, N$ , and $\overset{ˉ}{D}$ , and the last involving $T$ and $\overset{ˉ}{X} / \overset{ˉ}{Y}$ . If $N = 5000$ and $T = 1, 761, 300$ , calculate the three corresponding point estimates. (The article “Statistical Models and Analysis in Auditing,” Statistical Science, 1989: 2-33 discusses properties of these estimators.)

Urinary angiotensinogen (AGT) level is one quantitative indicator of kidney function. The article “Urinary Angiotensinogen as a Potential Biomarker of Chronic Kidney Diseases” (J. of the Amer. Society of Hypertension, 2008: 349-354) describes a study in which urinary AGT level $(μ g)$ was determined for a

sample of adults with chronic kidney disease. Here is representative data (consistent with summary quantities and descriptions in the cited article):

2.6	6.2	7.4	9.6	11.5	13.5	14.5	17.0
20.0	28.8	29.5	29.5	41.7	45.7	56.2	56.2
66.1	66.1	67.6	74.1	97.7	141.3	147.9	177.8
186.2	186.2	190.6	208.9	229.1	229.1	288.4	288.4
346.7	407.4	426.6	575.4	616.6	724.4	812.8	1122.0

An appropriate probability plot supports the use of the lognormal distribution (see Section 4.5) as a reasonable model for urinary AGT level (this is what the investigators did).

a. Estimate the parameters of the distribution. [Hint: Remember that $X$ has a lognormal distribution with parameters $μ$ and $σ^{2}$ if $ln (X)$ is normally distributed with mean $μ$ and variance $σ^{2}$ .]

b. Use the estimates of part (a) to calculate an estimate of the expected value of AGT level. [Hint: What is $E (X) ?]$

a. A random sample of 10 houses in a particular area, each of which is heated with natural gas, is selected and the amount of gas (therms) used during the month of January is determined for each house. The resulting observations are $103, 156, 118, 89, 125$ , $147, 122, 109, 138, 99$ . Let $μ$ denote the average gas usage during January by all houses in this area. Compute a point estimate of $μ$ .

b. Suppose there are 10,000 houses in this area that use natural gas for heating. Let $τ$ denote the total amount of gas used by all of these houses during January. Estimate $τ$ using the data of part (a). What estimator did you use in computing your estimate?

c. Use the data in part (a) to estimate $p$ , the proportion of all houses that used at least 100 therms.

d. Give a point estimate of the population median usage (the middle value in the population of all houses) based on the sample of part (a). What estimator did you use?

In a random sample of 80 components of a certain type, 12 are found to be defective.

a. Give a point estimate of the proportion of all such components that are not defective.

b. A system is to be constructed by randomly selecting two of these components and connecting them in series, as shown here.

019264b3-14d7-793b-ad40-3bd36b82fa05_15_954_1837_661_80_0.jpg

The series connection implies that the system will function if and only if neither component is defective (i.e., both components work properly). Estimate the proportion of all such systems that work properly. [Hint: If $p$ denotes the probability that a component works properly, how can $P$ (system works) be expressed in terms of $p$ ?]

Each of 150 newly manufactured items is examined and the number of scratches per item is recorded (the items

are supposed to be free of scratches), yielding the following data:

Number of scratches per item	0	1	2	3	4	5	6	7
Observed frequency	18	37	42	30	13	7	2	1

Let $X =$ the number of scratches on a randomly chosen item, and assume that $X$ has a Poisson distribution with parameter $μ$ .

a. Find an unbiased estimator of $μ$ and compute the estimate for the data. [Hint: $E (X) = μ$ for $X$ Poisson, so $E (\overset{ˉ}{X}) = ?]$

b. What is the standard deviation (standard error) of your estimator? Compute the estimated standard error. [Hint: $σ_{X}^{2} = μ$ for $X$ Poisson.]

Using a long rod that has length $μ$ , you are going to lay out a square plot in which the length of each side is $μ$ . Thus the area of the plot will be $μ^{2}$ . However, you do not know the value of $μ$ , so you decide to make $n$ independent measurements $X_{1}, X_{2}, \dots, X_{n}$ of the length. Assume that each $X_{i}$ has mean $μ$ (unbiased measurements) and variance $σ^{2}$ .

a. Show that $\overset{ˉ}{X}^{2}$ is not an unbiased estimator for $μ^{2}$ . [Hint: For any $rv Y, E (Y^{2}) = V (Y) + [E (Y)]^{2}$ . Apply this with $Y = \overset{ˉ}{X}$ .]

b. For what value of $k$ is the estimator $\overset{ˉ}{X}^{2} - k S^{2}$ unbiased for $μ^{2}$ ? [Hint: Compute $E (\overset{ˉ}{X}^{2} - k S^{2})$ .]

Of $n_{1}$ randomly selected male smokers, $X_{1}$ smoked filter cigarettes, whereas of $n_{2}$ randomly selected female smokers, $X_{2}$ smoked filter cigarettes. Let $p_{1}$ and $p_{2}$ denote the probabilities that a randomly selected male and female, respectively, smoke filter cigarettes.

a. Show that $(X_{1} / n_{1}) - (X_{2} / n_{2})$ is an unbiased estimator for $p_{1} - p_{2}$ . [Hint: $E (X_{i}) = n_{i} p_{i}$ for $i = 1, 2$ .]

b. What is the standard error of the estimator in part (a)?

c. How would you use the observed values $x_{1}$ and $x_{2}$ to estimate the standard error of your estimator?

d. If $n_{1} = n_{2} = 200, x_{1} = 127$ , and $x_{2} = 176$ , use the estimator of part (a) to obtain an estimate of $p_{1} - p_{2}$ .

e. Use the result of part (c) and the data of part (d) to estimate the standard error of the estimator.

Suppose a certain type of fertilizer has an expected yield per acre of $μ_{1}$ with variance $σ^{2}$ , whereas the expected yield for a second type of fertilizer is $μ_{2}$ with the same variance $σ^{2}$ . Let $S_{1}^{2}$ and $S_{2}^{2}$ denote the sample variances of yields based on sample sizes $n_{1}$ and $n_{2}$ , respectively, of the two fertilizers. Show that the pooled (combined) estimator

σ^{2} = \frac{( n _{1} - 1 ) S _{1}^{2} + ( n _{2} - 1 ) S _{2}^{2}}{n _{1} + n _{2} - 2}

is an unbiased estimator of $σ^{2}$ .

Consider a random sample $X_{1}, \dots, X_{n}$ from the pdf

f (x; θ) = .5 (1 + θ x) - 1 \leq x \leq 1

where $- 1 \leq θ \leq 1$ (this distribution arises in particle physics). Show that $θ = 3 \overset{ˉ}{X}$ is an unbiased estimator of $θ$ . [Hint: First determine $μ = E (X) = E (\overset{ˉ}{X})$ .]

A sample of $n$ captured Pandemonium jet fighters results in serial numbers $x_{1}, x_{2}, x_{3}, \dots, x_{n}$ . The CIA knows that the aircraft were numbered consecutively at the factory starting with $α$ and ending with $β$ , so that the total number of planes manufactured is $β - α + 1$ (e.g., if $α = 17$ and $β = 29$ , then $29 - 17 + 1 = 13$ planes having serial numbers $17, 18, 19, \dots, 28, 29$ were manufactured). However, the CIA does not know the values of $α$ or $β$ . A CIA statistician suggests using the estimator $max (X_{i}) - min (X_{i}) + 1$ to estimate the total number of planes manufactured.

a. If $n = 5, x_{1} = 237, x_{2} = 375, x_{3} = 202, x_{4} = 525$ , and $x_{5} = 418$ , what is the corresponding estimate?

b. Under what conditions on the sample will the value of the estimate be exactly equal to the true total number of planes? Will the estimate ever be larger than the true total? Do you think the estimator is unbiased for estimating $β - α + 1$ ? Explain in one or two sentences.

Let $X_{1}, X_{2}, \dots, X_{n}$ represent a random sample from a Rayleigh distribution with pdf

f (x; θ) = \frac{x}{θ} e^{- x^{2} / (2 θ)} x > 0

a. It can be shown that $E (X^{2}) = 2 θ$ . Use this fact to construct an unbiased estimator of $θ$ based on $\sum X_{i}^{2}$ (and use rules of expected value to show that it is unbiased).

b. Estimate $θ$ from the following $n = 10$ observations on vibratory stress of a turbine blade under specified conditions:

16.88 10.23 4.59 6.66 13.68

14.23 19.87 9.40 6.51 10.95

Suppose the true average growth $μ$ of one type of plant during a 1-year period is identical to that of a second type, but the variance of growth for the first type is $σ^{2}$ , whereas for the second type the variance is $4 σ^{2}$ . Let $X_{1}, \dots, X_{m}$ be $m$ independent growth observations on the first type [so $E (X_{i}) = μ, V (X_{i}) = σ^{2}$ ], and let $Y_{1}, \dots, Y_{n}$ be $n$ independent growth observations on the second type $[E (Y_{i}) = μ, V (Y_{i}) = 4 σ^{2}$ ].

a. Show that the estimator $μ = δ \overset{ˉ}{X} + (1 - δ) \overset{ˉ}{Y}$ is unbiased for $μ$ (for $0 < δ < 1$ , the estimator is a weighted average of the two individual sample means).

b. For fixed $m$ and $n$ , compute $V (μ)$ , and then find the value of $δ$ that minimizes $V (μ)$ . [Hint: Differentiate $V (μ)$ with respect to $δ$ .]

In Chapter 3, we defined a negative binomial rv as the number of failures that occur before the $r$ th success in a sequence of independent and identical success/failure trials. The probability mass function (pmf) of $X$ is

$nb (x; r, p) =$

(x + r - 1 x) p^{r} (1 - p)^{x} x = 0, 1, 2, \dots

a. Suppose that $r \geq 2$ . Show that

p = (r - 1) / (X + r - 1)

is an unbiased estimator for $p$ . [Hint: Write out $E (p)$ and cancel $x + r - 1$ inside the sum.]

b. A reporter wishing to interview five individuals who support a certain candidate begins asking people whether $(S)$ or not $(F)$ they support the candidate. If the sequence of responses is SFFSFFFSSS, estimate $p =$ the true proportion who support the candidate.

Let $X_{1}, X_{2}, \dots, X_{n}$ be a random sample from a pdf $f (x)$ that is symmetric about $μ$ , so that $X$ is an unbiased estimator of $μ$ . If $n$ is large, it can be shown that $V$ $(X) \approx 1/ (4 n [f (μ)]^{2})$ .

a. Compare $V (X)$ to $V (\overset{ˉ}{X})$ when the underlying distribution is normal.

b. When the underlying pdf is Cauchy (see Example 6.7), $V (\overset{ˉ}{X}) = \infty$ , so $\overset{ˉ}{X}$ is a terrible estimator. What is $V$ $(X)$ in this case when $n$ is large?

An investigator wishes to estimate the proportion of students at a certain university who have violated the honor code. Having obtained a random sample of $n$ students, she realizes that asking each, “Have you violated the honor code?” will probably result in some untruthful responses. Consider the following scheme, called a randomized response technique. The investigator makes up a deck of 100 cards, of which 50 are of type I and 50 are of type II.

Type I: Have you violated the honor code (yes or no)?

Type II: Is the last digit of your telephone number a 0 , 1, or 2 (yes or no)?

Each student in the random sample is asked to mix the deck, draw a card, and answer the resulting question truthfully. Because of the irrelevant question on type II cards, a yes response no longer stigmatizes the respondent, so we assume that responses are truthful. Let $p$ denote the proportion of honor-code violators (i.e., the probability of a randomly selected student being a violator), and let $λ = P$ (yes response). Then $λ$ and $p$ are related by $λ = .5 p + (.5) (.3)$ .

a. Let $Y$ denote the number of yes responses, so $Y \sim$ Bin $(n, λ)$ . Thus $Y / n$ is an unbiased estimator of $λ$ . Derive an estimator for $p$ based on $Y$ . If $n = 80$ and $y = 20$ , what is your estimate? [Hint: Solve $λ = .5 p + .15$ for $p$ and then substitute $Y / n$ for $λ$ .]

b. Use the fact that $E (Y / n) = λ$ to show that your estimator $p$ is unbiased.

c. If there were 70 type I and 30 type II cards, what would be your estimator for $p$ ?

Youliang Zhong

Table of Contents

Backlinks

Graph View

6.1 Some General Concepts of Point Estimation

6.1.1 Unbiased Estimators

6.1.2 Estimators with Minimum Variance

6.1.3 Some Complications

6.1.4 Reporting a Point Estimate: The Standard Error

EXERCISES Section 6.1 (1-19)