When two random variables and are not independent, it is frequently of interest to assess how strongly they are related to one another.
Definition
The covariance between two rv’s and is
That is, since and are the deviations of the two variables from their respective mean values, the covariance is the expected product of deviations. Note that
The rationale for the definition is as follows.
- Suppose and have a strong positive relationship to one another, by which we mean that large values of tend to occur with large values of and small values of with small values of .
- Then most of the probability mass or density will be associated with and , either both positive (both and above their respective means) or both negative, so the product will tend to be positive.
- Thus for a strong positive relationship, should be quite positive.
- For a strong negative relationship, the signs of and will tend to be opposite, yielding a negative product.
- Thus for a strong negative relationship, should be quite negative.
- If and are not strongly related, positive and negative products will tend to cancel one another, yielding a covariance near 0.
Figure 5.4 illustrates the different possibilities. The covariance depends on both the set of possible pairs and the probabilities. In Figure 5.4, the probabilities could be changed without altering the set of possible pairs, and this could drastically change the value of .
Figure 5.4 for each of ten pairs corresponding to indicated points: (a) positive covariance; (b) negative covariance; (c) covariance near zero
The following shortcut formula for simplifies the computations.
Proposition
According to this formula, no intermediate subtractions are necessary; only at the end of the computation is subtracted from . The proof involves expanding and then carrying the summation or integration through to each individual term.
EXAMPLE 5.16 (Example 5.5 were continued)
It might appear that the relationship in the insurance example is quite strong since , whereas in the nut example would seem to imply quite a weak relationship. Unfortunately, the covariance has a serious defect that makes it impossible to interpret a computed value. In the insurance example, suppose we had expressed the deductible amount in cents rather than in dollars. Then would replace would replace , and the resulting covariance would be If, on the other hand, the deductible amount had been expressed in hundreds of dollars, the computed covariance would have been (.01)(.01)(136,875) . The defect of covariance is that its computed value depends critically on the units of measurement. Ideally, the choice of units should have no effect on a measure of strength of relationship. This is achieved by scaling the covariance.