## Correlation

 Getting Started General Instructions | Introduction to Your Study Descriptive Statistics Histograms | Scatter Plots | Central Tendency | Standard Deviation | Confidence Intervals Relating Variables Correlation Important Concepts The Normal Distribution | Z Scores | Probability Distributions Levels You are currently on Correlation at level 2. Level 1 | Level 2 | Level 3 Next Topic Samples and Populations | The Normal Distribution

### Explanation

Calculating Correlation
The correlation between two variables may be calculated in more than one way. We will tell you about a measure known as Pearson's Product Moment Coefficient, which is commonly used.

Pearson's measure is denoted by the letter r. In order to understand what the calculation for r is doing, we need to think for a moment about what correlation really means. We saw in level 1 that the correlation coefficient tells you two things about the linear relationship between two variables:

• The direction of the relationship (positive or negative)
• The strength of the relationship - that is, how consistently linear it is
It is easiest to think of all these things with reference to a scatter plot of the data. If there is a linear relationship between the two variables, they will tend to form a straight line on the scatter plot. If the points on the scatter plot all line up perfectly, then correlation is 1 (or -1 if the slope goes down). If the points cluster about a line, forming a thin cloud, then correlation will be less. The thicker the cloud, the weaker the correlation.

The section below steps you through a visual explanation of how the formula for Pearson's Correlation Coefficient is derived. Use the Next and Previous links to step through it.

 Step 1 of 8Consider the diagram to the right. It shows a scatterplot of two variables, X and Y. You can see that they are related, but not perfectly.Next

### Exploration

Here is the formula for calculating Pearson's Correlation Coefficient.

The top part calculates the contribution of the distances from the mean, as explained above, and the bottom part divides by the number of data points minus 1 so that the value is comparable whatever the size of data, and divides by the product of the standard deviations of the two variables to remove the effect of wider variation in the data.

### Application

To calculate the correlation coefficient of your data, you will need to know the following:
• The mean of cost (denoted as Cost) is 517.53 and the mean of no. apps (denoted as No. Apps) is 56.73
• The standard deviation of cost is 298.47
• The standard deviation of no. apps is 30.64
• You have 30 data points

Your data are shown below. The first five rows have spaces for you to fill in the calculations, but the rest have been completed for you.

1. Subtract the mean of cost from each cost value and enter the result in the first column (keep the minus sign if there is one).
2. Do the same for each value of no. apps and its mean and enter its value in the second column.
3. Finally, multiply the two differences together to give you their product and enter that in the final column.
Enter answers to 2 decimal places.
CostNo. AppsCost - CostNo. Apps - No. AppsProduct
715
66362
980110
13812
86183
14541-372.53-15.735859.9
49346-24.53-10.73263.21
5487530.4718.27556.69
25123-266.53-33.738990.06
1024100506.4743.2721914.96
43541-82.53-15.731298.2
77275254.4718.274649.17
7110-446.53-46.7320866.35
66359145.472.27330.22
38162-136.535.27-719.51
13815-379.53-41.7315837.79
86178343.4721.277305.61
14559-372.532.27-845.64
49356-24.53-0.7317.91
5485630.47-0.73-22.24
25127-266.53-29.737923.94
1009110491.4753.2726180.61
43546-82.53-10.73885.55
772110254.4753.2713555.62
49346-24.53-10.73263.21
5487230.4715.27465.28
25123-266.53-33.738990.06
988100470.4743.2720357.24
43541-82.53-15.731298.2
66359145.472.27330.22
Calculator
( You need to enable Java to see this applet. )
Help
4. Now add up all the values in the product column and enter the total here:
You have completed the calculation for the top of the formula: (x - x)(y - y).
5. Now multiply (n-1) by the standard deviation for cost multiplied by the standard deviation for no. apps:
This is (n-1)SxSy, the bottom of the formula.
6. Finally, divide result from step 4 by the result from step 5 to get the correlation coefficient:
 Samples and Populations | The Normal Distribution