Calculating confidence intervals
Just a small recap from previous section (Understanding confidence intervals), confidence intervals is used in estimating the characteristics of population from sample statistics.
For instance, we would like to estimate the average height of all children in the state of Texas in the age group of 8 to 10 years. Here, we are trying to estimate the average height of population based on the average height of sample of children we considered for analysis.
Let us take up the concepts of ‘margin of error’ and ‘standard error of mean’ before we resume our discussion on calculating confidence intervals.
Standard error of mean (SEM)
Simply put, this represents the standard deviation of sample mean, which says how far the sample mean is from the actual population mean.
SEM = (Standard deviation of population or sample)/ SQRT (n)
Where n is the sample size. Standard deviation of sample is used only if the standard deviation of population is not available.
Hence, SEM is always less than standard deviation.
Greater the sample size less is the SEM because the sample closely represents the population and less is the deviation of sample mean from actual population mean.
In the above equation for SEM, we have used ‘standard deviation of population or sample/SQRT(n)’. The reason is, if samples of size n are drawn from a normal population with mean (µ) standard deviation (σ) then the sample mean follows normal distribution with mean (µ) and standard deviation of (σ/sqrt(n)).
For samples with greater ‘n’ size this holds true even if the population is not normal (From Central limit theorem). In case where we do not have standard deviation of population available, we use standard deviation of sample.
Margin of error
Confidence interval range is defined as a range of values that is symmetric around the sample statistic (Ex: Range of values around sample average height).
So, in order to build the confidence interval range, we need to calculate the appropriate margin of error at a particular confidence level.
SEM and Margin of error put together
Now, let us link SEM and margin of error. Suppose, at 90% confidence level, we want to calculate the confidence interval range of ‘mean’. Assuming that the sample means follow normal distribution, we can say interval as mentioned below.
If the sample size is small, we cannot apply central limit theorem (Central limit theorem and Normal distribution) and say that the sample mean follows approximate normal distribution. In such cases we can use t – distribution and t value is used instead of z value.
When to use z test and when to use t test?
Broad guidelines on whether to use z value or t value
- If the population distribution is not normal but sample size is high enough – Z value can be used
- If the population distribution is normal and sample size is high enough – Z value can be used
- If the population distribution is normal but sample size is small – t value can be used
- If the population distribution is not normal and sample size is small – z or t CANNOT be used.
- If the population standard deviation is known, z value can be used (Consider sample size also)
- If the population standard deviation is unknown, t value can be used (Consider sample size also)
Hope you have enjoyed this article! Let us know your feedback and suggestions! Follow us for more posts on statistics and analytics.