ExplanationThe Theory Behind Confidence Intervals
To understand how confidence limits work, you need to understand the following concepts. We provide a tutorial for each of them, so if you are unsure of any of them, you should read the appropriate sections first.
If the above is clear, then the following should make sense. This is how confidence intervals are derived:
- Any single sample mean is one of many possible sample means that might have been found for different random samples. In theory, all of those other possible sample means form a distribution called the sampling distribution of the mean;
- The sampling distribution of the mean is normal in shape, regardless of the shape of the population distribution;
- The mean of the sampling distribution of the mean is the same as the population mean;
- The standard deviation of the sampling distribution of the mean can be estimated by dividing the standard deviation of the sample by the square root of N (this is known as the Standard Error of the sample and is discussed at level three of the standard deviation topic);
- We can measure the distance between a single value (such as our sample mean) and the mean of a normal distribution using z-scores. A z-score of 1 indicates that a value is one standard deviation from the mean. Z-scores can be converted into probabilities if they are from a normal distribution;
- In particular, remember that 95% of the values in normally distributed data lie within 1.96 standard deviations of the mean.
- If you move 1.96 standard deviations in both directions from the mean of a normal distribution, you will cover 95% of the data in that distribution;
- We do not know the population mean but the converse of the above statement is also true: If you take each point in any sampling distribution of the mean and travel 1.96 standard deviations in both directions, then 95% of the time you will cross the population mean;
- A single standard deviation in any sampling distribution of the mean is estimated as the standard deviation of a sample divided by the square root of the sample size;
- 1.96 standard deviations in the sampling distribution of the mean is simply 1.96 times the figure in the point above. This gives us the formula for confidence intervals that you can see at level two.
- If 95 out of 100 sample means are within 1.96 standard deviations from the mean, then we can be 95% confident that any single sample mean is within that range. That is how the confidence limit is derived.
95% of sample means lie within 1.96 standard errors of the population mean. Any single sample mean consequently has a 95% chance of being within 1.96 standard errors from the true population mean.
You know that the standard error of a sample is its standard deviation divided by the square root of the sample size, so the following should be clear:
And, of course, lower standard errors lead to tighter confidence intervals.
- Larger samples lead to lower standard errors;
- Smaller samples lead to higher standard errors;
- Larger standard deviations lead to larger standard errors;
- Smaller standard deviations lead to lower standard errors.
If you want to use a confidence interval that is not 95%, here are the distances away from the mean for some other percentages of a normal distribution. The distances are in units of standard errors (measured from the sample).
|Confidence level||Standard Errors from mean|
These portions are shown in the diagram below. The scale along the bottom is number of standard errors and the coloured portions show the proportions of the data from the table above. To cover 99% of the data, you are including all coloured portions in both directions. To cover 95%, you include all the red, blue and green sections.