6.2 Methods of Point Estimation

We now introduce two “constructive” methods for obtaining point estimators: the method of moments and the method of maximum likelihood. By constructive we mean that the general definition of each type of estimator suggests explicitly how to obtain the estimator in any specific problem. Although maximum likelihood estimators are generally preferable to moment estimators because of certain efficiency properties, they often require significantly more computation than do moment estimators. It is sometimes the case that these methods yield unbiased estimators.

6.2.1 The Method of Moments

The basic idea of this method is to equate certain sample characteristics, such as the mean, to the corresponding population expected values. Then solving these equations for unknown parameter values yields the estimators.

DEFINITION Let be a random sample from a pmf or . For ,

, the th population moment, or th moment of the distribution ,

is . The th sample moment is . Copyright 2016 Cengage Learning: All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Congage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Thus the first population moment is , and the first sample moment is . The second population and sample moments are and , respectively. The population moments will be functions of any unknown parameters .

DEFINITION Let be a random sample from a distribution with pmf or pdf , where are parameters whose values are unknown. Then the moment estimators are obtained by equating the first sample moments to the corresponding first population moments and solving for .

If, for example, and will be functions of and . Setting and gives two equations in and . The solution then defines the estimators.

EXAMPLE 6.12 Let represent a random sample of service times of customers at a certain facility, where the underlying distribution is assumed exponential with parameter . Since there is only one parameter to be estimated, the estimator is obtained by equating to . Since for an exponential distribution, this gives or . The moment estimator of is then .

AMPLE 6.13 Let be a random sample from a gamma distribution with parameters and . From Section 4.4, and . The moment estimators of and are obtained by solving

Since and the first equation implies , the second equation becomes

Now dividing each side of this second equation by the corresponding side of the first equation and substituting back gives the estimators

To illustrate, the survival-time data mentioned in Example 4.24 is

from which and . The parameter estimates are

These estimates of and differ from the values suggested by Gross and Clark because they used a different estimation technique.

EXAMPLE 6.14 Let be a random sample from a generalized negative binomial distribution with parameters and (see Section 3.5). Since and . Equating to and to eventually gives

As an illustration, Reep, Pollard, and Benjamin (“Skill and Chance in Ball Games,” J. of Royal Stat. Soc., 1971: 623-629) consider the negative binomial distribution as a model for the number of goals per game scored by National Hockey League teams. The data for 1966-1967 follows (420 games):

Goals012345678910
Frequency297182896545247413

Then,

and

Thus,

Although by definition must be positive, the denominator of could be negative, indicating that the negative binomial distribution is not appropriate (or that the moment estimator is flawed).

6.2.2 Maximum Likelihood Estimation

The method of maximum likelihood was first introduced by R. A. Fisher, a geneticist and statistician, in the 1920s. Most statisticians recommend this method, at least when the sample size is large, since the resulting estimators have certain desirable efficiency properties (see the proposition on page 271).

XAMPLE 6.15 The best protection against hacking into an online account is to use a password that has at least 8 characters consisting of upper- and lowercase letters, numerals, and special characters. [Note: The Jan. 2012 issue of Consumer Reports reported that only of individuals surveyed used a strong password.] Suppose that 10 individuals who have email accounts with a certain provider are selected, and it is found that the first, third, and tenth individuals have such strong protection, whereas the others do not. Let (strong protection), i.e., is the proportion of all such account holders having strong protection. Define (Bernoulli) random variables by

Then for the obtained sample, and the other seven ’s are all zero. The probability mass function of any particular is , which becomes if and when . Now suppose that the conditions of various passwords are independent of one another. This implies that the ’s are independent, so their joint probability mass function is the product of the individual pmf’s. Thus the joint pmf evaluated at the observed ’s is

Suppose that . Then the probability of observing the sample that we actually obtained is . If instead , then this probability is . For what value of is the obtained sample most likely to have occurred? That is, for what value of is the joint pmf (6.4) as large as it can be? What value of maximizes (6.4)? Figure 6.6(a) shows a graph of the likelihood (6.4) as a function of . It appears that the graph reaches its peak above the proportion of flawed helmets in the sample. Figure 6.6(b) shows a graph of the natural logarithm of (6.4); since is a strictly increasing function of , finding to maximize the function is the same as finding to maximize .

019264b3-14d7-793b-ad40-3bd36b82fa05_20_427_722_1329_615_0.jpg

Figure 6.6 (a) Graph of the likelihood (joint pmf) (6.4) from Example 6.15 (b) Graph of the natural logarithm of the likelihood

We can verify our visual impression by using calculus to find the value of that maximizes (6.4). Working with the natural of the joint pmf is often easier than working with the joint pmf itself, since the joint pmf is typically a product so its logarithm will be a sum. Here

Thus

[the comes from the chain rule in calculus]. Equating this derivative to 0 and solving for gives , from which and so as conjectured. That is, our point estimate is . It is called the maximum likelihood estimate because it is the parameter value that maximizes the likelihood (joint pmf) of the observed sample. In general, the second derivative should be examined to make sure a maximum has been obtained, but here this is obvious from Figure 6.5.

Suppose that rather than being told the condition of every password, we had only been informed that three of the ten were strong. Then we would have the observed value of a binomial random variable the number with strong passwords. The pmf of is . For , this becomes . The binomial coefficient is irrelevant to the maximization, so again .


DEFINITION


Let have joint pmf or pdf

where the parameters have unknown values. When are the observed sample values and (6.6) is regarded as a function of , it is called the likelihood function. The maximum likelihood estimates (mle’s) are those values of the ’s that maximize the likelihood function, so that

When the ’s are substituted in place of the ’s, the maximum likelihood estimators result.

The likelihood function tells us how likely the observed sample is as a function of the possible parameter values. Maximizing the likelihood gives the parameter values for which the observed sample is most likely to have been generated-that is, the parameter values that “agree most closely” with the observed data.

EXAMPLE 6.16 Suppose is a random sample from an exponential distribution with parameter . Because of independence, the likelihood function is a product of the individual pdf’s:

The natural logarithm of the likelihood function is

Equating to zero results in , or . Thus the mle is ; it is identical to the method of moments estimator [but it is not an unbiased estimator, since .

EXAMPLE 6.17 Let be a random sample from a normal distribution. The likelihood function is

so

To find the maximizing values of and , we must take the partial derivatives of with respect to and , equate them to zero, and solve the resulting two equations. Omitting the details, the resulting mle’s are

The mle of is not the unbiased estimator, so two different principles of estimation (unbiasedness and maximum likelihood) yield two different estimators.

EXAMPLE 6.18 In Chapter 3, we mentioned the use of the Poisson distribution for modeling the number of “events” that occur in a two-dimensional region. Assume that when the region being sampled has area , the number of events occurring in has a Poisson distribution with parameter (where is the expected number of events per unit area) and that nonoverlapping regions yield independent ’s.

Suppose an ecologist selects nonoverlapping regions and counts the number of plants of a certain species found in each region. The joint pmf (likelihood) is then

The log likelihood is

Taking and equating it to zero yields

from which

The mle is then . This is intuitively reasonable because is the true density (plants per unit area), whereas is the sample density since is just the total area sampled. Because , the estimator is unbiased.

Sometimes an alternative sampling procedure is used. Instead of fixing regions to be sampled, the ecologist will select points in the entire region of interest and let the distance from the th point to the nearest plant. The cumulative distribution function (cdf) of distance to the nearest plant is

Taking the derivative of with respect to yields

If we now form the likelihood , differentiate ln(likelihood), and so on, the resulting mle is

which is also a sample density. It can be shown that in a sparse environment (small ), the distance method is in a certain sense better, whereas in a dense environment the first sampling method is better.

EXAMPLE 6.19 Let be a random sample from a Weibull pdf

Writing the likelihood and ln(likelihood), then setting both and , yields the equations

These two equations cannot be solved explicitly to give general formulas for the mle’s and . Instead, for each sample , the equations must be solved using an iterative numerical procedure. The R, SAS and Minitab software packages can be used for this purpose. Even moment estimators of and are somewhat complicated (see Exercise 21).

6.2.3 Estimating Functions of Parameters

Once the mle for a parameter is available, the mle for any function of , such as or , is easily obtained.


PROPOSITION


The Invariance Principle

Let be the mle’s of the parameters . Then the mle of any function of these parameters is the function of the mle’s.


continued) function:


In the normal case, the mle’s of and are and . To obtain the mle of the function , substitute the mle’s into the

The mle of is not the sample standard deviation , though they are close unless is quite small.

EXAMPLE 6.21 The mean value of an rv that has a Weibull distribution is

(Example 6.19

continued)

The mle of is therefore , where and are the mle’s of and . In particular, is not the mle of , though it is an unbiased estimator. At least for large is a better estimator than .

For the data given in Example 6.3, the mle’s of the Weibull parameters are and , from which . This estimate is quite close to the sample mean 73.88.

6.2.4 Large Sample Behavior of the MLE

Although the principle of maximum likelihood estimation has considerable intuitive appeal, the following proposition provides additional rationale for the use of mle’s.


PROPOSITION


Under very general conditions on the joint distribution of the sample, when the sample size is large, the maximum likelihood estimator of any parameter is at least approximately unbiased and has variance that is either as small as or nearly as small as can be achieved by any estimator. Stated another way, the mle is either exactly or at least approximately the MVUE of .

Because of this result and the fact that calculus-based techniques can usually be used to derive the mle’s (though often numerical methods, such as Newton’s method, are necessary), maximum likelihood estimation is the most widely used estimation technique among statisticians. Many of the estimators used in the remainder of the book are mle’s. Obtaining an mle, however, does require that the underlying distribution be specified.

6.2.5 Some Complications

Sometimes calculus cannot be used to obtain mle’s.


EXAMPLE 6.22


2 Suppose waiting time for a bus is uniformly distributed on and the results of a random sample from this distribution have been observed. Since for and 0 otherwise,

As long as , the likelihood is , which is positive, but as soon as , the likelihood drops to 0 . This is illustrated in Figure 6.7. Calculus will not work because the maximum of the likelihood occurs at a point of discontinuity, but the figure shows that . Thus if my waiting times are , and 3.2, then the mle is . From Example 6.4, the mle is not unbiased. Copyright 2016 Congage Learning. All Rights Reserved, May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/oreChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Congage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

019264b3-14d7-793b-ad40-3bd36b82fa05_24_847_1907_702_263_0.jpg

Figure 6.7 The likelihood function for Example 6.22

EXAMPLE 6.23 A method that is often used to estimate the size of a wildlife population involves performing a capture/recapture experiment. In this experiment, an initial sample of animals is captured, each of these animals is tagged, and the animals are then returned to the population. After allowing enough time for the tagged individuals to mix into the population, another sample of size is captured. With the number of tagged animals in the second sample, the objective is to use the observed to estimate the population size .

The parameter of interest is , which can assume only integer values, so even after determining the likelihood function (pmf of here), using calculus to obtain would present difficulties. If we think of a success as a previously tagged animal being recaptured, then sampling is without replacement from a population containing successes and failures, so that is a hypergeometric rv and the likelihood function is

The integer-valued nature of notwithstanding, it would be difficult to take the derivative of . However, if we consider the ratio of to , we have

This ratio is larger than 1 if and only if (iff) . The value of for which is maximized is therefore the largest integer less than . If we use standard mathematical notation for the largest integer less than or equal to , the mle of is . As an illustration, if fish are taken from a lake and tagged, and subsequently fish are recaptured, and among the 100 there are tagged fish, then . The estimate is actually rather intuitive; is the proportion of the recaptured sample that is tagged, whereas is the proportion of the entire population that is tagged. The estimate is obtained by equating these two proportions (estimating a population proportion by a sample proportion).

Suppose is a random sample from a pdf that is symmetric about but that the investigator is unsure of the form of the function. It is then desirable to use an estimator that is robust-that is, one that performs well for a wide variety of underlying pdf’s. One such estimator is a trimmed mean. In recent years, statisticians have proposed another type of estimator, called an -estimator; based on a generalization of maximum likelihood estimation. Instead of maximizing the log likelihood for a specified , one maximizes . The “objective function” is selected to yield an estimator with good robustness properties. The book by David Hoaglin et al. (see the bibliography) contains a good exposition of this topic.

EXERCISES Section 6.2 (20–30)

  1. A diagnostic test for a certain disease is applied to individuals known to not have the disease. Let the number among the test results that are positive (indicating presence of the disease, so is the number of false positives) and the probability that a disease-free individual’s test result is positive (i.e., is the true proportion of test results from disease-free individuals that are positive). Assume that only is available rather than the actual sequence of test results.

a. Derive the maximum likelihood estimator of . If and , what is the estimate?

b. Is the estimator of part (a) unbiased?

c. If and , what is the mle of the probability that none of the next five tests done on disease-free individuals are positive?

  1. Let have a Weibull distribution with parameters and , so

a. Based on a random sample , write equations for the method of moments estimators of and . Show that, once the estimate of has been obtained, the estimate of can be found from a table of the gamma function and that the estimate of is the solution to a complicated equation involving the gamma function.

b. If , and , compute the estimates. [Hint: .]

  1. Let denote the proportion of allotted time that a randomly selected student spends working on a certain aptitude test. Suppose the pdf of is

where . A random sample of ten students yields data , .

a. Use the method of moments to obtain an estimator of , and then compute the estimate for this data.

b. Obtain the maximum likelihood estimator of , and then compute the estimate for the given data.

  1. Let represent the error in making a measurement of a physical characteristic or property (e.g., the boiling point of a particular liquid). It is often reasonable to assume that and that has a normal distribution. Thus the pdf of any particular measurement error is

(where we have used in place of ). Now suppose that independent measurements are made, resulting in measurement errors . Obtain the mle of .

  1. A vehicle with a particular defect in its emission control system is taken to a succession of randomly selected mechanics until of them have correctly diagnosed the problem. Suppose that this requires diagnoses by 20 different mechanics (so there were 17 incorrect diagnoses). Let (correct diagnosis), so is the proportion of all mechanics who would correctly diagnose the problem. What is the mle of ? Is it the same as the mle if a random sample of 20 mechanics results in 3 correct diagnoses? Explain. How does the mle compare to the estimate resulting from the use of the unbiased estimator given in Exercise 17?

  2. The shear strength of each of ten test spot welds is determined, yielding the following data (psi):

a. Assuming that shear strength is normally distributed, estimate the true average shear strength and standard deviation of shear strength using the method of maximum likelihood.

b. Again assuming a normal distribution, estimate the strength value below which of all welds will have their strengths. [Hint: What is the 95th percentile in terms of and ? Now use the invariance principle.]

c. Suppose we decide to examine another test spot weld. Let shear strength of the weld. Use the given data to obtain the mle of . [Hint:

  1. Consider randomly selecting segments of pipe and determining the corrosion loss (mm) in the wall thickness for each one. Denote these corrosion losses by . The article “A Probabilistic Model for a Gas Explosion Due to Leakages in the Grey Cast Iron Gas Mains” (Reliability Engr. and System Safety (2013:270-279) proposes a linear corrosion model: , where is the age of the pipe and , the corrosion rate, is exponentially distributed with parameter . Obtain the maximum likelihood estimator of the exponential parameter (the resulting mle appears in the cited article). [Hint: If and has an exponential distribution, so does .]

  2. Let be a random sample from a gamma distribution with parameters and .

a. Derive the equations whose solutions yield the maximum likelihood estimators of and . Do you think they can be solved explicitly?

b. Show that the mle of is .

  1. Let represent a random sample from the Rayleigh distribution with density function given in Exercise 15. Determine

a. The maximum likelihood estimator of , and then calculate the estimate for the vibratory stress data given in that exercise. Is this estimator the same as the unbiased estimator suggested in Exercise 15?

b. The mle of the median of the vibratory stress distribution. [Hint: First express the median in terms of .]

  1. Consider a random sample from the shifted exponential pdf

Taking gives the pdf of the exponential distribution considered previously (with positive density to the right of zero). An example of the shifted exponential distribution appeared in Example 4.5, in which the variable of interest was time headway in traffic flow and was the minimum possible time headway.

a. Obtain the maximum likelihood estimators of and .

b. If time headway observations are made, resulting in the values , , and 1.30, calculate the estimates of and .

  1. At time identical components are tested. The lifetime distribution of each is exponential with parameter . The experimenter then leaves the test facility unmonitored. On his return 24 hours later, the experimenter immediately terminates the test after noticing that of the 20 components are still in operation (so 5 have failed). Derive the mle of . [Hint: Let the number that survive 24 hours. Then . What is the mle of ? Now notice that , where is exponentially distributed. This relates to , so the former can be estimated once the latter has been.]