## The Normal Distribution

 Getting Started General Instructions | Introduction to Your Study Descriptive Statistics Histograms | Scatter Plots | Central Tendency | Standard Deviation | Confidence Intervals Relating Variables Correlation Important Concepts The Normal Distribution | Z Scores | Probability Distributions Levels You are currently on The Normal Distribution at level 1. Level 1 | Level 2 | Level 3 Next Topic Correlation | Z Scores

### Explanation

Introducing the Normal Distribution
We have seen how any variable can be plotted as a frequency histogram, which is a summary of how often different values (or ranges of values) occur. The shape of this histogram reflects what is known as the data's frequency distribution. That is, how often each value appears in the data. The distribution of the data is reflected by the shape of the frequency histogram.

For reasons that are explained at level 3 of this topic, parametric techniques work better when the distribution of sample data is close enough to being a certain shape. On this page, we will introduce that distribution, look at its shape and see one way to compare the distribution of your sample to this ideal shape.

The distribution that we are talking about is known as the normal distribution and its histogram has the following properties:

• There is a single highest bar (the mode);
• There are as many values above the mode as there are below it (it is in the middle);
• The shape of the histogram is symmetrical about the mode, so the left side is a mirror image of the right;
• The frequency of values gets lower as you move further from the mode in a way that produces a bell shape.
The term 'Normal Distribution' means something quite specific and this page helps you to understand what that is. It just so happens that many natural sources of data do produce a normal distribution, so it is not too unreasonable to base a set of statistical techniques on the assumption that data has a normal distribution. However, please remember that the distribution is not called normal because it is common or usual.

Deciding Whether or Not Your Data is Normal
It can be useful to know whether or not your data is close to normally distributed. Looking at its frequency histogram is a good initial way to do so.

Here is a picture of a histogram produced from normally distributed data. Remember, it is the shape of the histogram that is important, not the raw data.

The sample that produced this histogram has a range from 0 to 8 and there are 100 values in the sample.

• The value 4 appears 30 times in the data - more often than any other - so 4 is the mode;
• The mode (4) is at the centre of the histogram;
• The shape is symmetrical (not perfectly, but close enough);
• You should be able to see how it is shaped like a bell - the frequencies drop off with increasing speed as you move away from the mode.
If your histogram looks like this (or close to it), then your sample is normally distributed.

If you do not have a histogram of your data to look at, you can still get an idea about whether or not it is normal from the measures of central tendency.

• In normally distributed data, the mean, mode and median are all the same (or very close);
• In normally distributed, the mode is near to the centre of the range.
If your sample does not have the qualities above, then it does not have a normal distribution. If it does have these qualities, it still might not be normal, so you should plot a histogram to find out.

What Use is the Normal Distribution?
The normal distribution is central to statistical theory, so it is useful to at least know what the phrase means. Some statistical techniques do not work very well on samples that are far from normal so it is good to get into the habit of checking for normality once you have a sample. We shall learn about those later in this tutorial.

### Exploration

To help you learn how to decide whether or not a histogram suggests that a distribution is normal, this game allows you to draw histograms of any shape you want. It draws a normal distribution shaped curve in red over the histogram to give you something to compare.

To choose the histogram shape, simply click on the chart at the height you want a bar to appear. The sample size is fixed, so all the bar heights will change each time you click.

Try breaking the requirements for normality one at a time:
• Make more than one high peak - make the peaks far apart;
• Make the plot asymmetrical;
• Move the mode towards either end;
• Flatten the bell shape so that all the bars are the same height.
Which of these shapes is furthest from being normal?
If you have two high bars and the rest of the bars low, which of these is closest to being normal?
( You need to enable Java to see this applet. )

### Application

You can examine whether or not your data is normally distributed by plotting its histogram and looking to see if it is the right shape.

Here is the histogram for cost. This plot has 3 highest bars, which are for the ranges 71 to < 190.13, 428.38 to < 547.5, 547.5 to < 666.63. There is no symmetry in the distribution of the data.

Look at the shape of your histogram and answer the questions below to decide whether or not the distribution of your data is normal.
Is the histogram symmetrical?
Is the mode (the highest bar) at the centre of the histogram?
Does the histogram shape suggest that the data is normal?

 Correlation | Z Scores