Cost per No. of Applications

The Normal Distribution

Tutorial Navigation

Getting StartedGeneral Instructions | Introduction to Your Study
Descriptive StatisticsHistograms | Scatter Plots | Central Tendency | Standard Deviation | Confidence Intervals
Relating VariablesCorrelation
Important ConceptsThe Normal Distribution | Z Scores | Probability Distributions
LevelsYou are currently on The Normal Distribution at level 3. Level 1 | Level 2 | Level 3
Next Topic Correlation | Z Scores

Explanation

Sampling Distribution of the Sample Means
We have already introduced the concept of sampling means. This is the idea that if you took many samples (rather than just the one you got from your study) you would get a different mean each time. These means form a distribution known as the distribution of sampling means.

Now, we introduce an interesting fact:

Provided the samples are big enough, no matter how the values in the population are distributed, the distribution of these sample means will be approximately normal!

This might take a moment to think about, but it is true. If you were to do the following:

  1. Take a sample of (say) 20 measurements from a population using simple random sampling;
  2. Calculate the mean of that sample;
  3. Record that mean and repeat the process from step one lots of times;
  4. When you have many samples, take the list of means (one from each sample) and plot a frequency histogram for them.
The shape of that histogram will be approximately normal. This fact is explained by The Central Limit Theorem, which you can read about in the extra topic below.

Remember that all these different samples are taken from the same population and will consequently all have a distribution that is similar to that of the population and a mean that is close to the population mean. Whatever shape that population distribution is, the sampling means distribution will always be close to normal.

This fact forms the basis of many statistical techniques and is the reason why your sample doesn't have to be normally distributed for them to work (however, with small samples, the distribution needs to be closer to normal than with large samples). It is also worth remembering that you only need to take one sample for the techniques to work. The theory requires you to imagine multiple samples to understand it, but in practice, one sample is sufficient. Understanding the normal distribution helps you understand a lot of statistical techniques.

Exploration

Use this game to explore how generating random numbers can lead to a normal distribution of sample means.

When you click the Sample button, the program will pick a sample of random numbers from 0 to 8 and then calculate their mean. How many numbers it picks is up to you - choose a number from 1 to 50 in the box provided.

The frequency histogram of the sample means will be built up as you make more and more samples. To start again with a different sample size, click Clear
You will need to repeatedly click the Sample button to build up a population of sample means. Fast repeated clicks will get your there sooner!

  • Start with a sample size of 1. That will simply plot the distribution of the random numbers. It should be pretty flat after a while because the random numbers are being picked with equal probability - like rolling dice.
  • Then increase the sample size to 5, click Clear and start clicking Sample again. What happens to the distribution of means?
  • Now change the sample size to 30. Now what happens to the distribution of means?

( You need to enable Java to see this applet. )

Can you see that, although the data has a flat distribution, the sample means are normally distributed? - Remember, LOTS of clicks on the Sample button
What is the relationship between the width of the distribution and the size of the samples?   Help

Application

Thinking about your data now, think about what would happen if you collected the same amount of data again by measuring a different group of Applications from the same population.
Which of these statements is true about your sample data?   Help
If you took another sample, measuring different Applications,
would the mean of that sample be the same as the mean of your sample?
  Help
If you took 100 samples, and calculated the mean for each, how many mean values would you have?   Help
These 100 means would form a distribution. Which of these options best describes this distribution?   Help
Do you need to go and collect another 100 samples to make use of the theory explained on this page?   Help
You learned at level 2 of this topic that a narrow normal distribution has a low standard deviation.
You saw above that larger samples lead to narrower sampling distributions.
What happens to the standard deviation of the distribution of sampling means as sample size grows?
  Help
Correlation | Z Scores