Introduction to Normal distribution
Normal distribution refers to the bell shaped distribution of values/curve that is symmetrical around its mean.
It is a kind of continuous probability distribution.
Importance of normal distribution follows from the fact that many natural distributions tend to be approximately normal.
For example, stock returns over a long period of time, marks distribution of students in a test, height of students in a class (Assuming in each of these cases there is significant population).
Central limit theorem that is pivotal to the subject of normal distributions will be in our next section in this ‘Normal distribution series’.
Some of the key features and characteristics of Normal curve are
- Each Normal curve could be identified by the combination of Mean (µ) and Standard deviation (σ)
- Symmetrical – 50% of values lie on either side of mean
- Also referred to as Gaussian distribution named after mathematician, Friedrich Gauss although this was first discovered by Abraham de Moivre.
- Mean = Median = Mode for any normal distribution
- Standard normal distribution has mean 0 and standard deviation of 1.
- For a given mean and standard deviation normal distribution has maximum entropy.
- Area under a normal curve is always 1. As at each value of x on X-axis we have probability of x on Y-axis. Sum of all probabilities will be 1. Hence, the area under a normal curve is always 1.
- Normal curve, theoretically never touches the x-axis. ‘Theoretically’, because the population is usually limited and not infinite in practice.
Example of normal distribution curve
Understanding Normal distribution
What does random variable x implies and how to understand the values plotted on y axis?
Let us take an example of distribution of marks in an exam. Average marks scored by students in the test are, say 70 and the standard deviation of marks scored is 10 (Mean is 70 and Std dev is 10).
We plot distribution of marks, that is, frequency of occurrence of marks on y axis. X axis contains values of marks and y axis represents number of students at a particular mark (or interval of marks in case intervals of marks are taken on x axis).
So, P(x) which is plotted on y axis could be interpreted as the probability of occurrence of x (x could be a specific point or an interval). Frequency distribution is generally used as a proxy for probability.
CAUTION: In the above example, we have considered distribution of marks which is a discrete distribution. However, if we have several datapoints, it could be approximated to a continuous distribution.
Normal distribution is a continuous distribution and probability at any particular value of x = 0
Normal distribution function
Standard Normal curve
As mentioned, each normal curve can be identified or differentiated by its mean and standard deviation. If the normal curve has mean as 0 and standard deviation as 1 then it becomes standard normal curve and the distribution is called standard normal distribution.
Mean is the average of all values of x and standard deviation tells us how far the values of x are distributed around the mean.
Interpretation from curves
Normal curves that have smaller standard deviation are sharper compared to the curves that have higher standard deviation for a given mean. For example, in the below curves, the one with standard deviation of 5 is sharper compared to the one with std devn of 10 (Both curves have same mean of 100)