**Introduction to Hypothesis testing**

Hypothesis resting: If a statement or a claim is made, either we can agree to it or we can try to disprove it. To agree requires no effort, but, if we have to disprove the claim, we must test or experiment, either the entire population or a sample from population. Although testing the entire population gives perfect result, often it is not practical as it requires huge time, effort and cost.

If we test the claim/statement using the data we get from sample, we must show some statistical significance for the test we perform so as to bring credibility for the test and the inferences made thereof.

We formulate a hypothesis to accept or reject a particular claim. Ho and H1 represent the null and alternate hypothesis respectively.

**Formulating hypothesis**

*Simple rule of thumb – Alternate hypothesis is what we want to prove.*

For example, if a claim was made that alcohol content in a particular brand of drink is 3%. This is the claim made by the manufacturer. If we accept it, fine, no effort is required from our side. If we challenge this claim and want to disprove it, we should formulate a hypothesis and either accept or reject it based on actual results.

In the above case Alternate hypothesis will be ‘Alcohol content in the drink NOT EQUAL to 3%’ and null hypothesis is ‘Alcohol content in the drink is EQUAL to 3%’.

*Ho: Alcohol content = 3% (Original claim)*

*H1: Alcohol content in the drink **≠3% (We want to prove this)*

Let us say, in the above case, we randomly purchase 40 different drink bottles and test them for alcohol content. The average alcohol content comes out be 3.04% and the standard deviation of sample is 0.15%. What is the next step? How to infer from this sample? And how to estimate the average alcohol content for entire population (All the drink bottles of this brand in the market)?

Let us understand P-value and Alpha (α) as these form the basis of our calculations

**Alpha (α)**

Alpha (α) is the area under the normal curve that is external to 95% area around the mean (For 95% confidence level). When we say 95% confidence level, we have 5% area external to this. Since, this 5% can be on either side, we have 2.5% ‘external area’ on each side. Hence, the critical value is 0.025 on either side or 0.05 in total with which the calculated p-value needs to be compared. But, what is P-Value?

For two tailed tests

For 1 tailed test

**P value**

P-value is defined as the probability of rejecting a null hypothesis when it is actually true. That means, based on sample analysis, we rejected the null hypothesis, but, in reality null hypothesis is true. Ideally, we want such probability to be very less. Hence, we keep confidence levels at 95% or 99% (Corresponding P value on either side is 2.5% or 0.5% respectively).

Alpha (α) is the maximum probability that we reject the null hypothesis when it is actually true. In 95% confidence level intervals, we have this as 5% and it is 1% in 99% confidence intervals.

### How to calculate P value?

- Z value is calculated as [(Calculated sample mean – Population mean)/ (σ/sqrt(n))]
- Look up corresponding P Value for this Z value from ‘Standard Normal tables’.
- Compare this P value with Alpha (α) the critical area

**Scenarios of Hypothesis**

In our previous example, we wanted to prove that alcohol content is not equal to 3%; we are not precisely saying that it will be more than 3% or less than 3%; all we are trying to prove is that it is NOT EQUAL to 3% (Could be higher or lower). Hence, we use two sided confidence interval.

Suppose, if a company claims that the lead content in its packaged food brand does not exceed 0.005 mg and if we want to challenge this claim, we may formulate the hypothesis as below.

*Ho: Lead content <= 0.005 mg (Original claim)*

*H1: Lead content > 0.005 mg **(We want to prove this)*

Here, we use a one sided p-Value to the right side of population mean (0.005 mg).

**Coming back to our example,**

As discussed in the Normal distribution section, we assume that the alcohol content in the branded drink follows normal distribution. Population mean is 3% (had the claim been true). We calculate the z-value corresponding to 3.04% (Our sample mean came out to be 3.04%)

Z – Value = (3.04% – 3%)/ (0.15%/sqrt(40)) = 1.68

Calculate corresponding ‘P’ value for z = 1.68; It is 0.093

Compare the above p-value with critical value (α); Alpha for 95% confidence interval is 0.05

Since calculated P-value is > 0.05, we CANNOT reject the null hypothesis.

Hence, we CANNOT reject the claim made by the manufacturer.

*Hope you have enjoyed this article! Let us know your feedback and suggestions! Follow us for more posts on statistics and analytics.*