Definition
The correlation coefficient of and , denoted by , , or just , is defined by
The following proposition shows that remedies the defect of and also suggests how to recognize the existence of a strong (linear) relationship.
Proposition
- If and are either both positive or both negative,
- For any two rv’s and , . The two variables are said to be uncorrelated when .
Statement 1 says precisely that the correlation coefficient is not affected by a linear change in the units of measurement (if, say, temperature in , then temperature in ).
According to Statement 2,
- the strongest possible positive relationship is evidenced by ,
- the strongest possible negative relationship corresponds to ,
- indicates the absence of a relationship. The proof of the first statement is sketched in Exercise 35, and that of the second appears in Supplementary Exercise 87 at the end of the chapter. For descriptive purposes, the relationship will be described as strong if , moderate if , and weak if .
If we think of or as prescribing a mathematical model for how the two numerical variables and are distributed in some population (height and weight, verbal SAT score and quantitative SAT score, etc.), then is a population characteristic or parameter that measures how strongly and are related in the population. In Chapter 12, we will consider taking a sample of pairs , from the population. The sample correlation coefficient will then be defined and used to make inferences about .
The correlation coefficient is actually not a completely general measure of the strength of a relationship.
Proposition
- If and are independent, then , but does not imply independence.
- or -1 iff for some numbers and with .
This proposition says that is a measure of the degree of linear relationship between and , and only when the two variables are perfectly related in a linear manner will be as positive or negative as it can be. However, if , there may still be a strong relationship between the two variables, just one that is not linear. And even if is close to 1, it may be that the relationship is really nonlinear but can be well approximated by a straight line.