In our previous section, we have seen how to construct a hypothesis (Hypothesis testing) and how to deal with different types of errors involved in hypothesis testing (Types of errors in Hypothesis testing).
You may have observed that in both the above topics we have dealt with data coming only from a sample drawn from a single population. We have worked only on single sample data set. What if we have to compare two populations?
In this section, we will see how to test the hypothesis constructed on different samples (Drawn from different populations). This section deals with comparing two different populations.
Comparing two groups or populations
When we say comparing different populations, it may be comparing the means of two populations, or it may be proportions or variances of two populations.
For example, we want to compare proportion of people who spend more than 70% of income in India as well as in the US. Here, we are comparing the proportions of two populations.
For the ease of our discussion, we compare the means (Averages) of two populations in this section.
At a very broad level, we have the following types of tests
- Parametric tests
- Non-Parametric tests
Parametric tests have some assumptions about the distribution of populations that we intend to compare
For example, if we want to use an independent sample t-test, we assume that the population is normally distributed.
Non-Parametric tests do not completely get away with assumptions, but they are less stringent and mainly there is no restriction on the distribution of population (Such as Normality)
Parametric tests are again of different types depending on the situation/context, data, and sample size.
- Independent samples test
- Paired Samples test
Independent samples test
Now, let’s try to understand the scenarios in which we try to compare two populations.
Suppose the problem at hand is to understand/compare the average monthly spending on entertainment (per person) in Delhi and Mumbai; we will draw two samples, one from Delhi and one from Mumbai and compare their average spend.
H0 (Null Hypothesis): Avg. spending in Delhi – Avg. Spending in Mumbai = 0
H1 (Alternate Hypothesis): Avg. spending in Delhi – Avg. Spending in Mumbai is NOT EQUAL to zero
Here, the two samples are independent of each other and such test is called independent samples test.
Paired sample test
While Independent samples test may be unavoidable in few cases, pairing of test values is a suggested method where possible.
For instance, the problem at hand is to understand the effect of drug on performance of athlete runners.
Here, the ideal experimentation would be take the same sample and test the performance once with drug and once without using the drug.
In the above example, we are checking the performance of the same set of people (Before taking the drug and after taking the drug). This will reflect a more accurate picture than if the sample athletes who take drug and who do not take drug are different.
H0 (Null Hypothesis):
Performance using the drug = Performance without using the drug
H1 (Alternate Hypothesis):
Performance using the drug NOT EQUAL TO Performance without using the drug
Here, the objective is to understand the influence of drug (Not sure if it has positive or negative impact), so, we have a two tailed test.
This is an example of paired samples test where the same sample is tested with different criteria (Once with drug and once without drug)
In this type of test, we are testing the same subjects at different time points or with different criteria.
Below is an example of paired samples test in which the same sample is tested at different time points.
In the next section, we will see mathematical aspects and calculation of parametric tests.