Introduction to five number summary

Given a dataset, five number summary provides an insight on distribution of data. It highlights the quartiles and percentiles in data.

Before moving with the calculation of five number summary it is recommended that you sort the dataset in ascending order (This is not a mandatory step, but it helps a lot for manual calculations)

Five number summary

It constitutes the following elements

  1. Minimum value – The lowest value in the dataset
  2. Q1 (First Quartile) – The value below which 25% of the data falls
  3. Q2 (Median) – The value below which 50% of the data falls
  4. Q3 (Third quartile) – The value below which 75% of the data falls
  5. Maximum Value – Highest value in the dataset

From percentiles perspective, Q1, Median and Q3 correspond to 25th percentile, 50th percentile and 75th percentile respectively

Example

Consider the dataset given below.

5NS1.0.jpg

Five number summary for the below dataset is presented below

5NS2.0.jpg

Box whisker Plot

Box plot shows the five number summary graphically (Shown below is the five number summary for the dataset considered above)

Box plot
Box plot

How to identify Outliers in data?

Outliers are those values in the data that deviate far off from the rest of the values (Something like odd man out).

Outliers could be on higher side or on the lower side. It is important to identify outliers as they can heavily influence the outcome of analyses performed on the data

Identifying outliers: Now that we know that outliers deviate far off from other values in the dataset, the next task is to understand what level of deviation makes a datapoint an outlier?

1.5 times IQR rule: This is the most commonly used method to identify the outliers

IQR refers to the interquartile range and is given by

5NS4.0.jpg5NS5.0.jpg

Task:

Identify outliers in below data using 1.5*IQR rule

Data: {4, 7, 8, 1, 2, 3, 9, 10, 10, 11, 12, 13, 13, 14, 15, 22}