Sent Successfully.

Home / Blog / Data Science / Box Plot

# Box Plot

**Table of Content**

**What is Box Plot?**

Yesterday, I unintentionally ran upon a former classmate in a neighbouring club. We had a lot to catch up on as we hadn't seen one other in a long time. Our conversation finally moved to the future.

'Friend':

"Hey do you remember I always wished to pursue higher studies. I think this is the right time. I am planning to pursue my Masters - MBA (Master of Business Administration)"

"Hey do you remember I always wished to pursue higher studies. I think this is the right time. I am planning to pursue my Masters - MBA (Master of Business Administration)"

'Me':

"Oh wow!!! Congratulations, so when are you starting and which university have you joined?"

"Oh wow!!! Congratulations, so when are you starting and which university have you joined?"

'Friend':

" I have not yet finalized any university. I am willing to join any university, though."

" I have not yet finalized any university. I am willing to join any university, though."

'Me':

"What Universities did you shortlist?"

"What Universities did you shortlist?"

'Friend':

" I have selected a few. I am finding it difficult to choose. Can you help me decide?"

'Me':

" Oh sure! Share the information that you have collected!"

" Oh sure! Share the information that you have collected!"

'Friend':

" Certainly!!"

" Certainly!!"

Yesterday, I unintentionally ran upon a former classmate in a neighbouring club. We had a lot to catch up on as we hadn't seen one other in a long time. Our conversation finally moved to the future.

I did my part of the research and collected a few more details about those Universities like:

- What sort of examination does one need to clear to apply?
- What are the minimum marks that one would need to score to secure admission in a University?
- What is the acceptance criteria of the University?
- What is the ratio of students to faculty?
- Are there any scholarship programs? If so, what is the maximum scholarship allowed?
- What would be the minimum expenses?
- What would be the salary package for graduates of the University? etc.

One of the university webpages included a message that drew my eye. "Students at the University earn an average of $132000 annually,"

Click here to explore 360DigiTMG.

I had gathered and organised the data necessary to begin my research, and I also had some intriguing thoughts to share with my buddy.

First Moment Business Decision or Measure of Central Tendency

**Mean**- Average of all the data in a column/feature**Median**- Middle value of the data in a column/feature**Mode**- Most frequently occurring value if the data is categorical

- Second Moment Business Decision or Measure of Dispersion
**Variance**- Gives information on what is the dispersion in the data.**Standard Deviation**- It overcomes the problem associated with variance.**Range**- Difference between the maximum and the minimum value in the data.

- Third Moment Business Decision or Skewness
- Fourth Moment Business Decision or Kurtosis

**Graphical Representation**

- Univariate – Requires one variable/feature to get a plot
- Histogram
- Bar plot
- Box plot
- QQplot

- Bivariate – Requires two variables to get a plot
- Multivariate – Requires many variables to get a plot

In this article, I will explain about the most helpful Univariate Graphical Representation concept called Box Plot.

**What is a Box Plot?**

**Box plot is a graphical representation of how the values in the data are spread out. Bo** ** **

Click here to learn Best Data Science Course in Hyderabad

**What is the information provided by a Box Plot?**

**Box Plot will provide the following information**

**Median (Q2/ 50th Percentile)**: The Middlemost value**First Quartile (Q1/25th Percentile)**: The middle number between the smallest number and median of the data**Third Quartile (Q3/75th Percentile)**: The middle number between the highest number and median of the data**Whisker**: A-line extending vertically from Box. Hence,**“Box Plot”**is also called**“Whisker Plot"**. Whisker represents a spread of 25% of the data (lower & upper whiskers)**Outliers**: Any data not included between the whiskers is plotted as an outlier with a dot**Inter Quartile Range (IQR)**: This is a measure of the difference between 75th and 25th percentiles simply,**IQR = Q3 – Q1****Minimum**: (Q1-1.5*IQR)**Maximum**: (Q3+1.5*IQR)

**Box Plot on Normal Distribution:**

**Let us try and understand the Box Plot on a normal distribution and the probability density function.**

**Box Plot on Normal Distribution**

**What is a normal distribution?**

**The normal distribution is explained with a 68 - 95 - 99.7 rule**

**Points to be considered here are:**

**68% of the data is within 1 standard deviation (σ) and of the mean (μ)****95% of the data is within 2 standard deviations (σ) and of the mean (μ)****99.7% of the data is with 3 standard deviations (σ) and of the mean (μ)****7% of Outliers**

**Note: Boxplot can be constructed on non-normal data also**

**Let us now understand more about Boxplot**

**The basic format of the box plot is to use a box to convey the middle 50% of the data. This region is called as InterQuartile Range - IQR.**

**Several variations on the traditional “Box Plot” exist. The most common among them are Variable Width Box Plot and Notched Box Plot.**

**Variable Width Box Plot**: It illustrates the size of each group of data by making the width proportional to the size of the group**Notched Box Plot**: It applies a “notch” or narrows at the median of the box. The width of notches is proportional to the IQR of the sample and inversely proportional to the square root of the size of the sample

### Learn the core concepts of Data Science Course video on Youtube:

**Math Behind Box Plot:**

**Consider a sample of 10 data points**

**11, 16, 12, 17, 14, 12, 16, 17, 13, 20**

**Order the data from smallest to largest**

11, 12, 12, 13, 14, 16, 16, 17, 17, 20**Find the Median****Median is the middlemost value for odd numbers of sample data, or it is the average of two middle numbers for even numbers of sample data i.e. 11, 12, 12, 13, 14**

Q1 = 12**The third quartile if the median of the data points to the right of the median.i.e. 16, 16, 17, 17, 20**

Q3 = 17**Complete the five-number summary by finding the minimum and maximum value in the dataset****The minimum is the smallest data point, which is 11****The maximum is the largest data point, which is 20****The five-number summary is 11, 12, 15, 17, 20**

**CONSTRUCTION OF A BOX PLOT USING THE ABOVE DATA**

**Mark an axis that fits the above five-number summary**

BOX PLOT AXIS

**Draw a box from Q1 to Q3 with a vertical line through a median**

Q1 = BOX PLOT DEMARCATING MEDIAN AND QUARTILES

**Draw a whisker from Q1 to the min and from Q3 to max**

Min = 11 and Max = 20

Q1 = BOX PLOT DEMARCATING WHISKERS

**Interpreting the Quartiles**

**The five-number summary divides the data into sections where each section contain 25% of the data in that set.
BOX PLOT
Since Q1=12, about 25% of the data is lower than 29 and about 75% is above 29.**

**Outliners:**

If the data happens to be normally distributed, then IQR = 1.35 σ where σ is a standard deviation.

**Outliers = 1.5 * IQR times more above the third quartile or below the first quartile.**

**Note that outliers are not necessarily always “bad”, they may be the most important and most informative part of the dataset. They should not be removed without properly verifying. Outliers are very important and require special treatment; they may be the key understudy or they may be the result of human errors.**

**Best Data Science Course in Bangalore**

**With the help of the Box Plot, I tried to derive insights for 10 different Universities. However, I will try to explain using one feature (Salaries) from the University data. (Please note for this article I have masked data points as it contains some sensitive information).**

**I have used Data Science tools like R and Python to come up with these insights.**

**R programming:**

**Load the required packages****“readxl” package to read an excel file****“read_excel”: The function to read an excel file****“file.choose()”: The argument to load the dataset using GUI****“attach()”: Function defines the content of the object. Used to call the column name directly without referring to the table name in R****“names()”: Function to show the column names**

**I have created a Box Plot to identify some of the outliers in the data. If I remove them the average salary will get affected.**

**BOX PLOT CODE IN R LANGUAGE**

**The output of the box plot will look like the below image which shows that there are some outliers, which are influencing the mean calculation. BOX PLOT CONTENT **

**Calculation of IQR:** I need Q1 and Q3 for which an inbuilt function quantile() is used to calculate percentiles 0.25 and 0.75 respectively.USING QUANTILE FUNCTION

**Outliers = (Q1 - 1.5 * IQR) and (Q3 + 1.5 * IQR).**

**Calculations of Outliners:****CODE FOR OUTLIERS**

**Now I have outlier’s data which I am appending to the original file
**

**APPENDING OUTLIERS DATA**

*I will share this final analysis report loaded into the file Outliers.csv with my friend so that he can now make a wise decision of choosing the right university.*

**I can achieve the same using Python Programming as well**

**Python Programming**

**Import the required libraries and import the dataset.**

IMPORTING PYTHON LIBRARIES

IMPORTING PYTHON DATASETS

**In Python, I used the seaborn library for the boxplot function**

PYTHON CODE FOR BOX PLOT**Used****swarmplot()**to get a better representation of the distribution of the data. However, if the data is large then this representation would not be an ideal one.SWARMPLOT FUNCTION

**The output shows the distribution of data points along with the boxplotBOX PLOT OUTPUT****We can see that there are some outliers however, we need to know what those outliers are****In order to calculate outliers mathematically, we need to come up with IQR (Inter Quartile Range) which is IQR = Q1 (Quartile 1) – Q3 (Quartile 3) i.e. 25th and 75th percentile.****In python we have a function quantile() to calculate percentiles, using Q1 and Q3 it can be calculatedIMPORTING PYTHON LIBRARIES**

QUANTILE FUNCTION

**With IQR I calculate outliers using the formula (Q1 – 1.5* IQR) and (Q3 + 1.5*IQR). The results from the python code will return as Boolean for outliers, it will print either as True or False. CODE FOR OUTLIERS **

**SALARIES OF UNIVERSITY STUDENTS AFTER BOX PLOT ANALYSIS**

** Conclusion : **Data science is all about sharing findings with audiences that might not be familiar with these undiscovered ideas. I was able to understand things from the analysis that I would not have otherwise. The outcomes of the statistical computations should guide my friend's decision-making.

I would warn my buddy not to be duped by the inflated salaries that certain universities are quoting.

Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore

### Data Science Placement Success Story

### Data Science Training Institutes in Other Locations

Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Visakhapatnam, Tirunelveli, Aurangabad

### Data Analyst Courses in Other Locations

ECIL, Jaipur, Pune, Gurgaon, Salem, Surat, Agra, Ahmedabad, Amritsar, Anand, Anantapur, Andhra Pradesh, Anna Nagar, Aurangabad, Bhilai, Bhopal, Bhubaneswar, Borivali, Calicut, Cochin, Chengalpattu , Dehradun, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Guduvanchery, Gwalior, Hebbal, Hoodi , Indore, Jabalpur, Jaipur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Kanpur, Khammam, Kochi, Kolhapur, Kolkata, Kothrud, Ludhiana, Madurai, Mangalore, Meerut, Mohali, Moradabad, Pimpri, Pondicherry, Porur, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thoraipakkam , Tiruchirappalli, Tirunelveli, Trichur, Trichy, Udaipur, Vijayawada, Vizag, Warangal, Chennai, Coimbatore, Delhi, Dilsukhnagar, Hyderabad, Kalyan, Nagpur, Noida, Thane, Thiruvananthapuram, Uppal, Kompally, Bangalore, Chandigarh, Chromepet, Faridabad, Guntur, Guwahati, Kharadi, Lucknow, Mumbai, Mysore, Nashik, Navi Mumbai, Patna, Pune, Raipur, Vadodara, Varanasi, Yelahanka

**Navigate to Address**

**Navigate to Address**

**360DigiTMG - Data Analytics, Data Science Course Training Hyderabad**

2-56/2/19, 3rd floor, Vijaya Towers, near Meridian School, Ayyappa Society Rd, Madhapur, Hyderabad, Telangana 500081

099899 94319

**Get Direction:** Data Science Course

**Get Direction:**