## Correlation

 Getting Started General Instructions | Introduction to Your Study Descriptive Statistics Histograms | Scatter Plots | Central Tendency | Standard Deviation | Confidence Intervals Relating Variables Correlation Important Concepts The Normal Distribution | Z Scores | Probability Distributions Levels You are currently on Correlation at level 1. Level 1 | Level 2 | Level 3 Next Topic Samples and Populations | The Normal Distribution

### Explanation

The Relationship Between Paired Variables
Correlation measures the strength of the linear relationship between paired numeric variables. The rest of this page explains what those words mean.
• Two variables are paired if each value from one variable can be sensibly paired with a single value from the other;
• There is a relationship between two variables if knowing the value of one from the pair tells you something about the likely value of the other;
• The strength of the relationship determines how much doubt about the value of one variable is removed by knowing the value of the other;
• A relationship is linear if plotting one variable against the other on a scatter plot produces a cloud of points around a straight line;
• The values have to be numbers (rather than words like male, female). In addition, the values in the data should have a number of other properties that will be examined at level 3 of this topic.
Your data is paired because each measurement of height has an associated weight that goes with it.

Examples of paired measurements include:

• Weight and height of a given person
• Temperature and rainfall on a given day
• Time spent studying and exam performance for a given student
If you measured the heights of 10 people and the weights of 10 other different people, you could not calculate a correlation between the two groups as you cannot say which height measurement is paired with each weight (and you can't just pair them off at random either!).

Both variables need to be numbers for a correlation to be calculated, so you cannot calculate the correlation between height and gender, for example.

What Does Correlation Measure?
Correlation measures two things about the relationship between two variables:

1. The direction of the relationship:
• A positive relationship means that when values of one variable are high, then values of the other, paired variable are also high and when values of one variable are low, so are values of the other. In other words, both variables move in the same direction together.
• A negative relationship means that when one variable is high, the other is low, and vice versa. In other words, the two variables move in opposite directions to each other.
2. The strength of the relationship:
• A strong relationship is one where paired values are consistently in the same direction (there is a consistent pattern of high values and low values going together).
• A weak relationship is one where there is a mixture of directions in the pairs (in some pairs both values are high, but in others some are high and some are low.
There are things that correlation does not tell you too. They are:
• Whether or not the variation in one variable is caused by changes in the other. For example, there is a correlation between the age of children and the age of their parents, but children do not cause their parents to age (well, not much anyway).
• How to predict the value of one variable from the other. I could tell you that income and house value have a high correlation, so you would know that the two were related, but you couldn't actually predict house value from income without further information.

Understanding the Correlation Coefficient
There are more than one way of calculating the correlation between two variables, but we will discuss a method called Pearson's correlation coefficient, which produces a single number between -1 and 1.

• Positive numbers indicate that a positive relationship, as defined above. A coefficient of +1 means a perfect (as strong as possible) positive correlation.
• A negative value indicates a negative relationship. A coefficient of -1 means a perfect negative correlation.
• Ignoring the sign, the stronger the relationship between the variables, the closer to 1 the correlation becomes. So 1 and -1 both indicate the strongest possible relationship, but in opposite directions. Correlation coefficients of -0.5 and 0.5 indicate two less reliable relationships, each in opposite directions.
Correlation is best understood by looking at scatter plots. Here are some examples of scatter plots along with an explanation of their correlation. Click the Next and Previous buttons to skip through the examples.

 Step 1 of 5Here is a plot of price against demand for some product. As price rises, demand falls so the correlation is negative. For any given price, there is still considerable variation in demand, so correlation is quite low. In fact, it is -0.5, which indicates that there is a relationship, but it doesn't explain all of the variation in demand.Next

Note, also, that correlation is symmetrical, which means that X correlated with Y means the same as Y correlated with X.

### Exploration

Use this simple demonstration to see how different pairs of variables, plotted against each other, lead to different correlation values.
1. Look at the pattern of points on the scatter plot
2. Guess what the correlation value of the data shown might be
3. Click anywhere on the chart to see the true correlation value and to see a different set of data
4. Repeat until you are guessing well each time - remember to guess the direction (sign) too.

( You need to enable Java to see this applet. )

Let's test what you have learned.

How does the amount that points are spread out away from a straight line affect correlation?

How does the direction of the relationship affect correlation?

How does the steepness of the slope of the relationship affect correlation?

### Application

The correlation between height and weight in your data is 0.42. Please answer the following test questions:
As height increases in your data, what would you expect weight to do?
A friend has done the same experiment and found a correlation value of -0.42. How does their value compare to yours?
Another friend has done the same experiment and found a correlation value of 0.21. How does their value compare to yours?
Another friend has done the same experiment and found a correlation value of 0.65. How does their value compare to yours?
Consider this statement: " Weight is always 0.42 of height". Is that right?
 Samples and Populations | The Normal Distribution