6.2 Methods of Point Estimation

We now introduce two “constructive” methods for obtaining point estimators: the method of moments and the method of maximum likelihood. By constructive we mean that the general definition of each type of estimator suggests explicitly how to obtain the estimator in any specific problem. Although maximum likelihood estimators are generally preferable to moment estimators because of certain efficiency properties, they often require significantly more computation than do moment estimators. It is sometimes the case that these methods yield unbiased estimators.

6.2.1 The Method of Moments

The basic idea of this method is to equate certain sample characteristics, such as the mean, to the corresponding population expected values. Then solving these equations for unknown parameter values yields the estimators.

DEFINITION Let $X_{1}, \dots, X_{n}$ be a random sample from a pmf or $pdf f (x)$ . For $k = 1, 2$ ,

$3, \dots$ , the $k$ th population moment, or $k$ th moment of the distribution $f (x)$ ,

is $E (X^{k})$ . The $k$ th sample moment is $(1/ n) \sum_{i = 1}^{n} X_{i}^{k}$ . Copyright 2016 Cengage Learning: All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Congage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Thus the first population moment is $E (X) = μ$ , and the first sample moment is $\sum X_{i} / n = \overset{ˉ}{X}$ . The second population and sample moments are $E (X^{2})$ and $\sum X_{i}^{2} / n$ , respectively. The population moments will be functions of any unknown parameters $θ_{1}, θ_{2}, \dots$ .

DEFINITION Let $X_{1}, X_{2}, \dots, X_{n}$ be a random sample from a distribution with pmf or pdf $f (x; θ_{1}, \dots, θ_{m})$ , where $θ_{1}, \dots, θ_{m}$ are parameters whose values are unknown. Then the moment estimators $θ_{1}, \dots, θ_{m}$ are obtained by equating the first $m$ sample moments to the corresponding first $m$ population moments and solving for $θ_{1}, \dots, θ_{m}$ .

If, for example, $m = 2, E (X)$ and $E (X^{2})$ will be functions of $θ_{1}$ and $θ_{2}$ . Setting $E (X) = (1/ n) \sum X_{i} (= \overset{ˉ}{X})$ and $E (X^{2}) = (1/ n) \sum X_{i}^{2}$ gives two equations in $θ_{1}$ and $θ_{2}$ . The solution then defines the estimators.

EXAMPLE 6.12 Let $X_{1}, X_{2}, \dots, X_{n}$ represent a random sample of service times of $n$ customers at a certain facility, where the underlying distribution is assumed exponential with parameter $λ$ . Since there is only one parameter to be estimated, the estimator is obtained by equating $E (X)$ to $\overset{ˉ}{X}$ . Since $E (X) = 1/ λ$ for an exponential distribution, this gives $1/ λ = \overset{ˉ}{X}$ or $λ = 1/ \overset{ˉ}{X}$ . The moment estimator of $λ$ is then $λ = 1/ \overset{ˉ}{X}$ .

AMPLE 6.13 Let $X_{1}, \dots, X_{n}$ be a random sample from a gamma distribution with parameters $α$ and $β$ . From Section 4.4, $E (X) = α β$ and $E (X^{2}) = β^{2} Γ (α + 2) /Γ (α) = β^{2} (α + 1) α$ . The moment estimators of $α$ and $β$ are obtained by solving

\overset{ˉ}{X} = α β \frac{1}{n} \sum X_{i}^{2} = α (α + 1) β^{2}

Since $α (α + 1) β^{2} = α^{2} β^{2} + α β^{2}$ and the first equation implies $α^{2} β^{2} = \overset{ˉ}{X}^{2}$ , the second equation becomes

\frac{1}{n} \sum X_{i}^{2} = \overset{ˉ}{X}^{2} + α β^{2}

Now dividing each side of this second equation by the corresponding side of the first equation and substituting back gives the estimators

α = \frac{X ˉ ^{2}}{( 1/ n ) \sum X _{i}^{2} - X ˉ ^{2}} β = \frac{( 1/ n ) \sum X _{i}^{2} - X ˉ ^{2}}{X ˉ}

To illustrate, the survival-time data mentioned in Example 4.24 is

$152115109948813715277160165$

$12540128123136101621538369$

from which $\overset{x}{ˉ} = 113.5$ and $(1/ 20) \sum x_{i}^{2} = 14, 087.8$ . The parameter estimates are

α = \frac{( 113.5 ) ^{2}}{14 , 087.8 - ( 113.5 ) ^{2}} = 10.7 β = \frac{14 , 087.8 - ( 113.5 ) ^{2}}{113.5} = 10.6

These estimates of $α$ and $β$ differ from the values suggested by Gross and Clark because they used a different estimation technique.

EXAMPLE 6.14 Let $X_{1}, \dots, X_{n}$ be a random sample from a generalized negative binomial distribution with parameters $r$ and $p$ (see Section 3.5). Since $E (X) = r (1 - p) / p$ and $V (X) = r (1 - p) / p^{2}, E (X^{2}) = V (X) + [E (X)]^{2} = r (1 - p) (r - r p + 1) / p^{2}$ . Equating $E (X)$ to $\overset{ˉ}{X}$ and $E (X^{2})$ to $(1/ n) \sum X_{i}^{2}$ eventually gives

p = \frac{X ˉ}{( 1/ n ) \sum X _{i}^{2} - X ˉ ^{2}} r = \frac{X ˉ ^{2}}{( 1/ n ) \sum X _{i}^{2} - X ˉ ^{2} - X ˉ}

As an illustration, Reep, Pollard, and Benjamin (“Skill and Chance in Ball Games,” J. of Royal Stat. Soc., 1971: 623-629) consider the negative binomial distribution as a model for the number of goals per game scored by National Hockey League teams. The data for 1966-1967 follows (420 games):

Goals	0	1	2	3	4	5	6	7	8	9	10
Frequency	29	71	82	89	65	45	24	7	4	1	3

Then,

\overset{x}{ˉ} = \sum x_{i} / 420 = [(0) (29) + (1) (71) + \dots + (10) (3)] / 420 = 2.98

and

\sum x_{i}^{2} / 420 = [(0)^{2} (29) + (1)^{2} (71) + \dots + (10)^{2} (3)] / 420 = 12.40

Thus,

p = \frac{2.98}{12.40 - ( 2.98 ) ^{2}} = .85 r = \frac{( 2.98 ) ^{2}}{12.40 - ( 2.98 ) ^{2} - 2.98} = 16.5

Although $r$ by definition must be positive, the denominator of $r$ could be negative, indicating that the negative binomial distribution is not appropriate (or that the moment estimator is flawed).

6.2.2 Maximum Likelihood Estimation

The method of maximum likelihood was first introduced by R. A. Fisher, a geneticist and statistician, in the 1920s. Most statisticians recommend this method, at least when the sample size is large, since the resulting estimators have certain desirable efficiency properties (see the proposition on page 271).

XAMPLE 6.15 The best protection against hacking into an online account is to use a password that has at least 8 characters consisting of upper- and lowercase letters, numerals, and special characters. [Note: The Jan. 2012 issue of Consumer Reports reported that only $25 %$ of individuals surveyed used a strong password.] Suppose that 10 individuals who have email accounts with a certain provider are selected, and it is found that the first, third, and tenth individuals have such strong protection, whereas the others do not. Let $p = P$ (strong protection), i.e., $p$ is the proportion of all such account holders having strong protection. Define (Bernoulli) random variables $X_{1}, X_{2}, \dots, X_{10}$ by

X_{1} = {1 if 1 st has strong protection 0 if 1 st does not have strong protection \dots X_{10} = {1 if 10 th has strong protection 0 if 10 th does not have strong protection

Then for the obtained sample, $X_{1} = X_{3} = X_{10} = 1$ and the other seven $X_{i}$ ’s are all zero. The probability mass function of any particular $X_{i}$ is $p^{x_{i}} (1 - p)^{1 - x_{i}}$ , which becomes $p$ if $x_{i} = 1$ and $1 - p$ when $x_{i} = 0$ . Now suppose that the conditions of various passwords are independent of one another. This implies that the $X_{i}$ ’s are independent, so their joint probability mass function is the product of the individual pmf’s. Thus the joint pmf evaluated at the observed $X_{i}$ ’s is

f (x_{1}, \dots, x_{10}; p) = p (1 - p) p \dots p = p^{3} (1 - p)^{7} (6.4)

Suppose that $p = .25$ . Then the probability of observing the sample that we actually obtained is $(.25)^{3} (.75)^{7} = .002086$ . If instead $p = .50$ , then this probability is $(.50)^{3} (.50)^{7} = .000977$ . For what value of $p$ is the obtained sample most likely to have occurred? That is, for what value of $p$ is the joint pmf (6.4) as large as it can be? What value of $p$ maximizes (6.4)? Figure 6.6(a) shows a graph of the likelihood (6.4) as a function of $p$ . It appears that the graph reaches its peak above $p = .3 =$ the proportion of flawed helmets in the sample. Figure 6.6(b) shows a graph of the natural logarithm of (6.4); since $ln [g (u)]$ is a strictly increasing function of $g (u)$ , finding $u$ to maximize the function $g (u)$ is the same as finding $u$ to maximize $ln [g (u)]$ .

019264b3-14d7-793b-ad40-3bd36b82fa05_20_427_722_1329_615_0.jpg

Figure 6.6 (a) Graph of the likelihood (joint pmf) (6.4) from Example 6.15 (b) Graph of the natural logarithm of the likelihood

We can verify our visual impression by using calculus to find the value of $p$ that maximizes (6.4). Working with the natural $lo g$ of the joint pmf is often easier than working with the joint pmf itself, since the joint pmf is typically a product so its logarithm will be a sum. Here

ln [f (x_{1}, \dots, x_{10}; p)] = ln [p^{3} (1 - p)^{7}] = 3 ln (p) + 7 ln (1 - p) (6.5)

Thus

\frac{d}{d p} {ln [f (x_{1}, \dots, x_{10}; p)]} = \frac{d}{d p} {3 ln (p) + 7 ln (1 - p)} = \frac{3}{p} + \frac{7}{1 - p} (- 1)

= \frac{3}{p} - \frac{7}{1 - p}

[the $(- 1)$ comes from the chain rule in calculus]. Equating this derivative to 0 and solving for $p$ gives $3 (1 - p) = 7 p$ , from which $3 = 10 p$ and so $p = 3/ 10 = .30$ as conjectured. That is, our point estimate is $p = .30$ . It is called the maximum likelihood estimate because it is the parameter value that maximizes the likelihood (joint pmf) of the observed sample. In general, the second derivative should be examined to make sure a maximum has been obtained, but here this is obvious from Figure 6.5.

Suppose that rather than being told the condition of every password, we had only been informed that three of the ten were strong. Then we would have the observed value of a binomial random variable $X =$ the number with strong passwords. The pmf of $X$ is $(10 x) p^{x} (1 - p)^{10 - x}$ . For $x = 3$ , this becomes $(103) p^{3} (1 - p)^{7}$ . The binomial coefficient $(103)$ is irrelevant to the maximization, so again $p = .30$ .

DEFINITION

Let $X_{1}, X_{2}, \dots, X_{n}$ have joint pmf or pdf

f (x_{1}, x_{2}, \dots, x_{n}; θ_{1}, \dots, θ_{m}) (6.6)

where the parameters $θ_{1}, \dots, θ_{m}$ have unknown values. When $x_{1}, \dots, x_{n}$ are the observed sample values and (6.6) is regarded as a function of $θ_{1}, \dots, θ_{m}$ , it is called the likelihood function. The maximum likelihood estimates (mle’s) $θ_{1}, \dots, θ_{m}$ are those values of the $θ_{i}$ ’s that maximize the likelihood function, so that

f (x_{1}, \dots, x_{n}; θ_{1}, \dots, θ_{m}) \geq f (x_{1}, \dots, x_{n}; θ_{1}, \dots, θ_{m}) for all θ_{1}, \dots, θ_{m}

When the $X_{i}$ ’s are substituted in place of the $x_{i}$ ’s, the maximum likelihood estimators result.

The likelihood function tells us how likely the observed sample is as a function of the possible parameter values. Maximizing the likelihood gives the parameter values for which the observed sample is most likely to have been generated-that is, the parameter values that “agree most closely” with the observed data.

EXAMPLE 6.16 Suppose $X_{1}, X_{2}, \dots, X_{n}$ is a random sample from an exponential distribution with parameter $λ$ . Because of independence, the likelihood function is a product of the individual pdf’s:

f (x_{1}, \dots, x_{n}; λ) = (λ e^{- λ x_{1}}) \dots\dots (λ e^{- λ x_{n}}) = λ^{n} e^{- λ \sum x_{i}}

The natural logarithm of the likelihood function is

ln [f (x_{1}, \dots, x_{n}; λ)] = n ln (λ) - λ \sum x_{i}

Equating $(d / d λ) [ln (likelihood)]$ to zero results in $n / λ - \sum x_{i} = 0$ , or $λ = n / \sum x_{i} = 1/ \overset{x}{ˉ}$ . Thus the mle is $λ = 1/ \overset{ˉ}{X}$ ; it is identical to the method of moments estimator [but it is not an unbiased estimator, since $E (1/ \overset{ˉ}{X}) \neq = 1/ E (\overset{ˉ}{X})]$ .

EXAMPLE 6.17 Let $X_{1}, \dots, X_{n}$ be a random sample from a normal distribution. The likelihood function is

f (x_{1}, \dots, x_{n}; μ, σ^{2}) = \frac{1}{2 π σ ^{2}} e^{- (x_{1} - μ)^{2} / (2 σ^{2})} \cdot \dots \cdot \frac{1}{2 π σ ^{2}} e^{- (x_{n} - μ)^{2} / (2 σ^{2})}

= (\frac{1}{2 π σ ^{2}})^{n /2} e^{- \sum (x_{i} - μ)^{2} / (2 σ^{2})}

ln [f (x_{1}, \dots, x_{n}; μ, σ^{2})] = - \frac{n}{2} ln (2 π σ^{2}) - \frac{1}{2 σ ^{2}} \sum (x_{i} - μ)^{2}

To find the maximizing values of $μ$ and $σ^{2}$ , we must take the partial derivatives of $ln (f)$ with respect to $μ$ and $σ^{2}$ , equate them to zero, and solve the resulting two equations. Omitting the details, the resulting mle’s are

μ = \overset{ˉ}{X} σ^{2} = \frac{\sum ( X _{i} - X ˉ ) ^{2}}{n}

The mle of $σ^{2}$ is not the unbiased estimator, so two different principles of estimation (unbiasedness and maximum likelihood) yield two different estimators.

EXAMPLE 6.18 In Chapter 3, we mentioned the use of the Poisson distribution for modeling the number of “events” that occur in a two-dimensional region. Assume that when the region $R$ being sampled has area $a (R)$ , the number $X$ of events occurring in $R$ has a Poisson distribution with parameter $λa (R)$ (where $λ$ is the expected number of events per unit area) and that nonoverlapping regions yield independent $X$ ’s.

Suppose an ecologist selects $n$ nonoverlapping regions $R_{1}, \dots, R_{n}$ and counts the number of plants of a certain species found in each region. The joint pmf (likelihood) is then

p (x_{1}, \dots, x_{n}; λ) = \frac{[ λ \cdot a ( R _{1} ) ] ^{x_{1}} e ^{- λ \cdot a (R_{1})}}{x _{1} !} \cdot \dots \cdot \frac{[ λ \cdot a ( R _{n} ) ] ^{x_{n}} e ^{- λ \cdot a (R_{n})}}{x _{n} !}

= \frac{[ a ( R _{1} ) ] ^{x_{1}} \cdot \dots \cdot [ a ( R _{n} ) ] ^{x_{n}} \cdot λ ^{\sum x_{i}} \cdot e ^{- λ \sum a (R_{i})}}{x _{1} ! \cdot \dots \cdot x _{n} !}

The log likelihood is

ln [p (x_{1}, \dots, x_{n}; λ)] = \sum x_{i} \cdot ln [a (R_{i})] + ln (λ) \cdot \sum x_{i} - λ \sum a (R_{i}) - \sum ln (x_{i}!)

Taking $d / d λ [ln (p)]$ and equating it to zero yields

\frac{\sum x _{i}}{λ} - \sum a (R_{i}) = 0

from which

λ = \frac{\sum x _{i}}{\sum a ( R _{i} )}

The mle is then $λ = \sum X_{i} / \sum a (R_{i})$ . This is intuitively reasonable because $λ$ is the true density (plants per unit area), whereas $λ$ is the sample density since $\sum a (R_{i})$ is just the total area sampled. Because $E (X_{i}) = λ \cdot a (R_{i})$ , the estimator is unbiased.

Sometimes an alternative sampling procedure is used. Instead of fixing regions to be sampled, the ecologist will select $n$ points in the entire region of interest and let $y_{i} =$ the distance from the $i$ th point to the nearest plant. The cumulative distribution function (cdf) of $Y =$ distance to the nearest plant is

F_{Y} (y) = P (Y \leq y) = 1 - P (Y > y) = 1 - P (no plants in a circle of radius y)

= 1 - \frac{e ^{- λπ y^{2}} ( λπ y ^{2} ) ^{0}}{0 !} = 1 - e^{- λ \cdot π y^{2}}

Taking the derivative of $F_{Y} (y)$ with respect to $y$ yields

f_{Y} (y; λ) = {2 πλ y e^{- λπ y^{2}} 0 y \geq 0 otherwise

If we now form the likelihood $f_{Y} (y_{1}; λ) \cdot \dots \cdot f_{Y} (y_{n}; λ)$ , differentiate ln(likelihood), and so on, the resulting mle is

λ = \frac{n}{π \sum Y _{i}^{2}} = \frac{number of plants observed}{total area sampled}

which is also a sample density. It can be shown that in a sparse environment (small $λ$ ), the distance method is in a certain sense better, whereas in a dense environment the first sampling method is better.

EXAMPLE 6.19 Let $X_{1}, \dots, X_{n}$ be a random sample from a Weibull pdf

f (x; α, β) = {\frac{α}{β ^{α}} \cdot x^{α - 1} \cdot e^{- (x / β)^{α}} 0 x \geq 0 otherwise

Writing the likelihood and ln(likelihood), then setting both $(\partial / \partial α) [ln (f)] = 0$ and $(\partial / \partial β) [ln (f)] = 0$ , yields the equations

α = [\frac{\sum x _{i}^{α} \cdot ln ( x _{i} )}{\sum x _{i}^{α}} - \frac{\sum ln ( x _{i} )}{n}]^{- 1} β = (\frac{\sum x _{i}^{α}}{n})^{1/ α}

These two equations cannot be solved explicitly to give general formulas for the mle’s $α$ and $β$ . Instead, for each sample $x_{1}, \dots, x_{n}$ , the equations must be solved using an iterative numerical procedure. The R, SAS and Minitab software packages can be used for this purpose. Even moment estimators of $α$ and $β$ are somewhat complicated (see Exercise 21).

6.2.3 Estimating Functions of Parameters

Once the mle for a parameter $θ$ is available, the mle for any function of $θ$ , such as $1/ θ$ or $θ$ , is easily obtained.

PROPOSITION

The Invariance Principle

Let $θ_{1}, θ_{2}, \dots, θ_{m}$ be the mle’s of the parameters $θ_{1}, θ_{2}, \dots, θ_{m}$ . Then the mle of any function $h (θ_{1}, θ_{2}, \dots, θ_{m})$ of these parameters is the function $h (θ_{1}, θ_{2}, \dots, θ_{m})$ of the mle’s.

continued) function:

In the normal case, the mle’s of $μ$ and $σ^{2}$ are $μ = \overset{ˉ}{X}$ and $σ^{2} = \sum (X_{i} - \overset{ˉ}{X})^{2} / n$ . To obtain the mle of the function $h (μ, σ^{2}) = σ^{2} = σ$ , substitute the mle’s into the

σ = σ^{2} = [\frac{1}{n} \sum (X_{i} - \overset{ˉ}{X})^{2}]^{1/2}

The mle of $σ$ is not the sample standard deviation $S$ , though they are close unless $n$ is quite small.

EXAMPLE 6.21 The mean value of an rv $X$ that has a Weibull distribution is

(Example 6.19

continued)

μ = β \cdot Γ (1 + 1/ α)

The mle of $μ$ is therefore $μ = β Γ (1 + 1/ α)$ , where $α$ and $β$ are the mle’s of $α$ and $β$ . In particular, $\overset{ˉ}{X}$ is not the mle of $μ$ , though it is an unbiased estimator. At least for large $n, μ$ is a better estimator than $\overset{ˉ}{X}$ .

For the data given in Example 6.3, the mle’s of the Weibull parameters are $α = 11.9731$ and $β = 77.0153$ , from which $μ = 73.80$ . This estimate is quite close to the sample mean 73.88.

6.2.4 Large Sample Behavior of the MLE

Although the principle of maximum likelihood estimation has considerable intuitive appeal, the following proposition provides additional rationale for the use of mle’s.

PROPOSITION

Under very general conditions on the joint distribution of the sample, when the sample size $n$ is large, the maximum likelihood estimator of any parameter $θ$ is at least approximately unbiased $[E (θ) \approx θ]$ and has variance that is either as small as or nearly as small as can be achieved by any estimator. Stated another way, the mle $θ$ is either exactly or at least approximately the MVUE of $θ$ .

Because of this result and the fact that calculus-based techniques can usually be used to derive the mle’s (though often numerical methods, such as Newton’s method, are necessary), maximum likelihood estimation is the most widely used estimation technique among statisticians. Many of the estimators used in the remainder of the book are mle’s. Obtaining an mle, however, does require that the underlying distribution be specified.

6.2.5 Some Complications

Sometimes calculus cannot be used to obtain mle’s.

EXAMPLE 6.22

2 Suppose waiting time for a bus is uniformly distributed on $[0, θ]$ and the results $x_{1}, \dots, x_{n}$ of a random sample from this distribution have been observed. Since $f (x; θ) = 1/ θ$ for $0 \leq x \leq θ$ and 0 otherwise,

f (x_{1}, \dots, x_{n}; θ) = {\frac{1}{θ ^{n}} 0 0 \leq x_{1} \leq θ, \dots, 0 \leq x_{n} \leq θ otherwise

As long as $max (x_{i}) \leq θ$ , the likelihood is $1/ θ^{n}$ , which is positive, but as soon as $θ < max (x_{i})$ , the likelihood drops to 0 . This is illustrated in Figure 6.7. Calculus will not work because the maximum of the likelihood occurs at a point of discontinuity, but the figure shows that $θ = max (X_{i})$ . Thus if my waiting times are $2.3, 3.7, 1.5, .4$ , and 3.2, then the mle is $θ = 3.7$ . From Example 6.4, the mle is not unbiased. Copyright 2016 Congage Learning. All Rights Reserved, May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/oreChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Congage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

019264b3-14d7-793b-ad40-3bd36b82fa05_24_847_1907_702_263_0.jpg

Figure 6.7 The likelihood function for Example 6.22

EXAMPLE 6.23 A method that is often used to estimate the size of a wildlife population involves performing a capture/recapture experiment. In this experiment, an initial sample of $M$ animals is captured, each of these animals is tagged, and the animals are then returned to the population. After allowing enough time for the tagged individuals to mix into the population, another sample of size $n$ is captured. With $X =$ the number of tagged animals in the second sample, the objective is to use the observed $x$ to estimate the population size $N$ .

The parameter of interest is $θ = N$ , which can assume only integer values, so even after determining the likelihood function (pmf of $X$ here), using calculus to obtain $N$ would present difficulties. If we think of a success as a previously tagged animal being recaptured, then sampling is without replacement from a population containing $M$ successes and $N - M$ failures, so that $X$ is a hypergeometric rv and the likelihood function is

p (x; N) = h (x; n, M, N) = \frac{( M x ) \cdot ( N - M n - x )}{( N n )}

The integer-valued nature of $N$ notwithstanding, it would be difficult to take the derivative of $p (x; N)$ . However, if we consider the ratio of $p (x; N)$ to $p (x; N - 1)$ , we have

\frac{p ( x ; N )}{p ( x ; N - 1 )} = \frac{( N - M ) \cdot ( N - n )}{N ( N - M - n + x )}

This ratio is larger than 1 if and only if (iff) $N < M n / x$ . The value of $N$ for which $p (x; N)$ is maximized is therefore the largest integer less than $M n / x$ . If we use standard mathematical notation $[r]$ for the largest integer less than or equal to $r$ , the mle of $N$ is $N = [M n / x]$ . As an illustration, if $M = 200$ fish are taken from a lake and tagged, and subsequently $n = 100$ fish are recaptured, and among the 100 there are $x = 11$ tagged fish, then $N = [(200) (100) / 11] = [1818.18] = 1818$ . The estimate is actually rather intuitive; $x / n$ is the proportion of the recaptured sample that is tagged, whereas $M / N$ is the proportion of the entire population that is tagged. The estimate is obtained by equating these two proportions (estimating a population proportion by a sample proportion).

Suppose $X_{1}, X_{2}, \dots, X_{n}$ is a random sample from a pdf $f (x; θ)$ that is symmetric about $θ$ but that the investigator is unsure of the form of the $f$ function. It is then desirable to use an estimator $θ$ that is robust-that is, one that performs well for a wide variety of underlying pdf’s. One such estimator is a trimmed mean. In recent years, statisticians have proposed another type of estimator, called an $M$ -estimator; based on a generalization of maximum likelihood estimation. Instead of maximizing the log likelihood $\sum ln [f (x; θ)]$ for a specified $f$ , one maximizes $\sum ρ (x_{i}; θ)$ . The “objective function” $ρ$ is selected to yield an estimator with good robustness properties. The book by David Hoaglin et al. (see the bibliography) contains a good exposition of this topic.

EXERCISES Section 6.2 (20–30)

A diagnostic test for a certain disease is applied to $n$ individuals known to not have the disease. Let $X =$ the number among the $n$ test results that are positive (indicating presence of the disease, so $X$ is the number of false positives) and $p =$ the probability that a disease-free individual’s test result is positive (i.e., $p$ is the true proportion of test results from disease-free individuals that are positive). Assume that only $X$ is available rather than the actual sequence of test results.

a. Derive the maximum likelihood estimator of $p$ . If $n = 20$ and $x = 3$ , what is the estimate?

b. Is the estimator of part (a) unbiased?

c. If $n = 20$ and $x = 3$ , what is the mle of the probability $(1 - p)^{5}$ that none of the next five tests done on disease-free individuals are positive?

Let $X$ have a Weibull distribution with parameters $α$ and $β$ , so

E (X) = β \cdot Γ (1 + 1/ α)

V (X) = β^{2} {Γ (1 + 2/ α) - [Γ (1 + 1/ α)]^{2}}

a. Based on a random sample $X_{1}, \dots, X_{n}$ , write equations for the method of moments estimators of $β$ and $α$ . Show that, once the estimate of $α$ has been obtained, the estimate of $β$ can be found from a table of the gamma function and that the estimate of $α$ is the solution to a complicated equation involving the gamma function.

b. If $n = 20, \overset{x}{ˉ} = 28.0$ , and $\sum x_{i}^{2} = 16, 500$ , compute the estimates. [Hint: $[Γ (1.2)]^{2} /Γ (1.4) = .95$ .]

Let $X$ denote the proportion of allotted time that a randomly selected student spends working on a certain aptitude test. Suppose the pdf of $X$ is

f (x; θ) = {(θ + 1) x^{θ} 0 0 \leq x \leq 1 otherwise

where $- 1 < θ$ . A random sample of ten students yields data $x_{1} = .92, x_{2} = .79, x_{3} = .90, x_{4} = .65, x_{5} = .86$ , $x_{6} = .47, x_{7} = .73, x_{8} = .97, x_{9} = .94, x_{10} = .77$ .

a. Use the method of moments to obtain an estimator of $θ$ , and then compute the estimate for this data.

b. Obtain the maximum likelihood estimator of $θ$ , and then compute the estimate for the given data.

Let $X$ represent the error in making a measurement of a physical characteristic or property (e.g., the boiling point of a particular liquid). It is often reasonable to assume that $E (X) = 0$ and that $X$ has a normal distribution. Thus the pdf of any particular measurement error is

f (x; θ) = \frac{1}{2 π θ} e^{- x^{2} / 2 θ} - \infty < x < \infty

(where we have used $θ$ in place of $σ^{2}$ ). Now suppose that $n$ independent measurements are made, resulting in measurement errors $X_{1} = x_{1}, X_{2} = x_{2}, \dots, X_{n} = x_{n}$ . Obtain the mle of $θ$ .

A vehicle with a particular defect in its emission control system is taken to a succession of randomly selected mechanics until $r = 3$ of them have correctly diagnosed the problem. Suppose that this requires diagnoses by 20 different mechanics (so there were 17 incorrect diagnoses). Let $p = P$ (correct diagnosis), so $p$ is the proportion of all mechanics who would correctly diagnose the problem. What is the mle of $p$ ? Is it the same as the mle if a random sample of 20 mechanics results in 3 correct diagnoses? Explain. How does the mle compare to the estimate resulting from the use of the unbiased estimator given in Exercise 17?
The shear strength of each of ten test spot welds is determined, yielding the following data (psi):

$392376401367389362409415358375$

a. Assuming that shear strength is normally distributed, estimate the true average shear strength and standard deviation of shear strength using the method of maximum likelihood.

b. Again assuming a normal distribution, estimate the strength value below which $95 %$ of all welds will have their strengths. [Hint: What is the 95th percentile in terms of $μ$ and $σ$ ? Now use the invariance principle.]

c. Suppose we decide to examine another test spot weld. Let $X =$ shear strength of the weld. Use the given data to obtain the mle of $P (X \leq 400)$ . [Hint: $P (X \leq 400) = Φ ((400 - μ) / σ) .]$

Consider randomly selecting $n$ segments of pipe and determining the corrosion loss (mm) in the wall thickness for each one. Denote these corrosion losses by $Y_{1}, \dots, Y_{n}$ . The article “A Probabilistic Model for a Gas Explosion Due to Leakages in the Grey Cast Iron Gas Mains” (Reliability Engr. and System Safety (2013:270-279) proposes a linear corrosion model: $Y_{i} = t_{i} R$ , where $t_{i}$ is the age of the pipe and $R$ , the corrosion rate, is exponentially distributed with parameter $λ$ . Obtain the maximum likelihood estimator of the exponential parameter (the resulting mle appears in the cited article). [Hint: If $c > 0$ and $X$ has an exponential distribution, so does $c X$ .]
Let $X_{1}, \dots, X_{n}$ be a random sample from a gamma distribution with parameters $α$ and $β$ .

a. Derive the equations whose solutions yield the maximum likelihood estimators of $α$ and $β$ . Do you think they can be solved explicitly?

b. Show that the mle of $μ = α β$ is $μ = \overset{ˉ}{X}$ .

Let $X_{1}, X_{2}, \dots, X_{n}$ represent a random sample from the Rayleigh distribution with density function given in Exercise 15. Determine

a. The maximum likelihood estimator of $θ$ , and then calculate the estimate for the vibratory stress data given in that exercise. Is this estimator the same as the unbiased estimator suggested in Exercise 15?

b. The mle of the median of the vibratory stress distribution. [Hint: First express the median in terms of $θ$ .]

Consider a random sample $X_{1}, X_{2}, \dots, X_{n}$ from the shifted exponential pdf

f (x; λ, θ) = {λ e^{- λ (x - θ)} 0 x \geq θ otherwise

Taking $θ = 0$ gives the pdf of the exponential distribution considered previously (with positive density to the right of zero). An example of the shifted exponential distribution appeared in Example 4.5, in which the variable of interest was time headway in traffic flow and $θ = .5$ was the minimum possible time headway.

a. Obtain the maximum likelihood estimators of $θ$ and $λ$ .

b. If $n = 10$ time headway observations are made, resulting in the values $3.11, .64, 2.55, 2.20, 5.44$ , $3.42, 10.39, 8.93, 17.82$ , and 1.30, calculate the estimates of $θ$ and $λ$ .

At time $t = 0, 20$ identical components are tested. The lifetime distribution of each is exponential with parameter $λ$ . The experimenter then leaves the test facility unmonitored. On his return 24 hours later, the experimenter immediately terminates the test after noticing that $y = 15$ of the 20 components are still in operation (so 5 have failed). Derive the mle of $λ$ . [Hint: Let $Y =$ the number that survive 24 hours. Then $Y \sim Bin (n, p)$ . What is the mle of $p$ ? Now notice that $p = P (X_{i} \geq 24)$ , where $X_{i}$ is exponentially distributed. This relates $λ$ to $p$ , so the former can be estimated once the latter has been.]

Youliang Zhong

Table of Contents

Backlinks

Graph View

6.2 Methods of Point Estimation