ExplanationIntroducing the Normal Distribution
We have seen how any variable can be plotted as a frequency histogram, which is a summary of how often different values (or ranges of values) occur. The shape of this histogram reflects what is known as the data's frequency distribution. That is, how often each value appears in the data. The distribution of the data is reflected by the shape of the frequency histogram.
For reasons that are explained at level 3 of this topic, parametric techniques work better when the distribution of sample data is close enough to being a certain shape. On this page, we will introduce that distribution, look at its shape and see one way to compare the distribution of your sample to this ideal shape.
The distribution that we are talking about is known as the normal distribution and its histogram has the following properties:
The term 'Normal Distribution' means something quite specific and this page helps you to understand what that is. It just so happens that many natural sources of data do produce a normal distribution, so it is not too unreasonable to base a set of statistical techniques on the assumption that data has a normal distribution. However, please remember that the distribution is not called normal because it is common or usual.
- There is a single highest bar (the mode);
- There are as many values above the mode as there are below it (it is in the middle);
- The shape of the histogram is symmetrical about the mode, so the left side is a mirror image of the right;
- The frequency of values gets lower as you move further from the mode in a way that produces a bell shape.
Deciding Whether or Not Your Data is Normal
It can be useful to know whether or not your data is close to normally distributed. Looking at its frequency histogram is a good initial way to do so.
Here is a picture of a histogram produced from normally distributed data. Remember, it is the shape of the histogram that is important, not the raw data.
The sample that produced this histogram has a range from 0 to 8 and there are 100 values in the sample.
If your histogram looks like this (or close to it), then your sample is normally distributed.
- The value 4 appears 30 times in the data - more often than any other - so 4 is the mode;
- The mode (4) is at the centre of the histogram;
- The shape is symmetrical (not perfectly, but close enough);
- You should be able to see how it is shaped like a bell - the frequencies drop off with increasing speed as you move away from the mode.
If you do not have a histogram of your data to look at, you can still get an idea about whether or not it is normal from the measures of central tendency.
If your sample does not have the qualities above, then it does not have a normal distribution. If it does have these qualities, it still might not be normal, so you should plot a histogram to find out.
- In normally distributed data, the mean, mode and median are all the same (or very close);
- In normally distributed, the mode is near to the centre of the range.
What Use is the Normal Distribution?
The normal distribution is central to statistical theory, so it is useful to at least know what the phrase means. Some statistical techniques do not work very well on samples that are far from normal so it is good to get into the habit of checking for normality once you have a sample. We shall learn about those later in this tutorial.