The Relationship Between Paired Variables
Correlation measures the strength of the linear relationship between paired numeric variables. The rest of this page explains what those words mean.
Your data is paired because each measurement of height has an associated weight that goes with it.
- Two variables are paired if each value from one variable can be sensibly paired with a single value from the other;
- There is a relationship between two variables if knowing the value of one from the pair tells you something about the likely value of the other;
- The strength of the relationship determines how much doubt about the value of one variable is removed by knowing the value of the other;
- A relationship is linear if plotting one variable against the other on a scatter plot produces a cloud of points around a straight line;
- The values have to be numbers (rather than words like male, female). In addition, the values in the data should have a number of other properties that will be examined at level 3 of this topic.
Examples of paired measurements include:
If you measured the heights of 10 people and the weights of 10 other different people, you could not calculate a correlation between the two groups as you cannot say which height measurement is paired with each weight (and you can't just pair them off at random either!).
- Weight and height of a given person
- Temperature and rainfall on a given day
- Time spent studying and exam performance for a given student
Both variables need to be numbers for a correlation to be calculated, so you cannot calculate the correlation between height and gender, for example.
What Does Correlation Measure?
Correlation measures two things about the relationship between two variables:
There are things that correlation does not tell you too. They are:
- The direction of the relationship:
- A positive relationship means that when values of one variable are high, then values of the other, paired variable are also high and when values of one variable are low, so are values of the other. In other words, both variables move in the same direction together.
- A negative relationship means that when one variable is high, the other is low, and vice versa. In other words, the two variables move in opposite directions to each other.
- The strength of the relationship:
- A strong relationship is one where paired values are consistently in the same direction (there is a consistent pattern of high values and low values going together).
- A weak relationship is one where there is a mixture of directions in the pairs (in some pairs both values are high, but in others some are high and some are low.
- Whether or not the variation in one variable is caused by changes in the other. For example, there is a correlation between the age of children and the age of their parents, but children do not cause their parents to age (well, not much anyway).
- How to predict the value of one variable from the other. I could tell you that income and house value have a high correlation, so you would know that the two were related, but you couldn't actually predict house value from income without further information.
Understanding the Correlation Coefficient
There are more than one way of calculating the correlation between two variables, but we will discuss a method called Pearson's correlation coefficient, which produces a single number between -1 and 1.
Correlation is best understood by looking at scatter plots. Here are some examples of scatter plots along with an explanation of their correlation. Click the Next and Previous buttons to skip through the examples.
- Positive numbers indicate that a positive relationship, as defined above. A coefficient of +1 means a perfect (as strong as possible) positive correlation.
- A negative value indicates a negative relationship. A coefficient of -1 means a perfect negative correlation.
- Ignoring the sign, the stronger the relationship between the variables, the closer to 1 the correlation becomes. So 1 and -1 both indicate the strongest possible relationship, but in opposite directions. Correlation coefficients of -0.5 and 0.5 indicate two less reliable relationships, each in opposite directions.
|Step 1 of 5|
Here is a plot of price against demand for some product. As price rises, demand falls so the correlation is negative. For any given price, there is still considerable variation in demand, so correlation is quite low. In fact, it is -0.5, which indicates that there is a relationship, but it doesn't explain all of the variation in demand.
Note, also, that correlation is symmetrical, which means that X correlated with Y means the same as Y correlated with X.