Statistics Tutorial

Correlation

Tutorial Navigation

Getting StartedGeneral Instructions | Introduction to Your Study
Descriptive StatisticsHistograms | Scatter Plots | Central Tendency | Standard Deviation | Confidence Intervals
Relating VariablesCorrelation
Important ConceptsThe Normal Distribution | Z Scores | Probability Distributions
LevelsYou are currently on Correlation at level 3. Level 1 | Level 2 | Level 3
Next Topic Samples and Populations | The Normal Distribution

Explanation

Avoiding Mistakes with CorrelationAt level one of this topic, we said that correlation only worked with variables that had a linear relationship and values that were numbers.
Pearson's correlation coefficient is a measure of how well two numeric variables are linearly related. Two variables are linearly related if, when plotted against each other, their points fall on a straight line. This means that two variables could be perfectly related and still produce a very low correlation measure if they have a non-linear relationship (such as a U shaped curve, for example). It is also easy to produce a very high (and misleading) correlation from a sample that contains outliers.

So we can conclude the following:

  • Low correlation might mean that there is no relationship between the variables, but it could result from a non-linear relationship being present;
  • High correlation could mean a strong relationship, but it might also indicate outliers in the data.

Exploration

This game lets you explore how the value of Pearson's correlation coefficient changes depending on how far from a straight line the data points fall. The points on the scatter plot show the values of paired variables and the line shows the closest straight line through that data. If the data all lie on this line, then correlation is 1 or -1. The further from the line the points are, the weaker the correlation.

Select any point by clicking on it, which will make it turn red. Then click anywhere on the chart to move the selected point to a new position and see how the line and the correlation value change. There are buttons along the top of the game to allow you to arrange the points along a flat horizontal line, at random, or sloping up or down.

Here are some things to try:

  1. Arrange the points in a flat horizontal line. What is the correlation value?   
  2. Move one point far from the line and see what happens to the correlation figure. Put that point back and try another, then another.
    Which points have the largest effect on correlation?   
  3. Arrange the points to give a correlation value of 1. By moving one point only, how can you reduce the correlation value the most?   
  4. Click the random button. Then move one point and try to get a high correlation value. Where should you move the one point?   
( You need to enable Java to see this applet. )

Here are a few extra things for you to try

  1. Try to produce a high correlation value for unrelated data points
  2. Try to produce a low correlation value for mostly related data points

Application

Look at the scatter plot of your data below. We have added a line which shows the closest straight line to all the points in your data.

Performance Rating Plotted Against Written

The correlation coefficient of this data is 0.8. Look at the points in the scatter plot and use what you learned from the game above to answer the following questions.

  • Do you think the correlation coefficient of 0.8 reflects the structure of the data well?
  • Is there a non-linear relationship in the data that the correlation does not reflect?
  • Does a single outlier artificially make the correlation coefficient larger than it should be?
Samples and Populations | The Normal Distribution