Login
Congrats in choosing to up-skill for your bright career! Please share correct details.
Home / Blog / Data Science / Simple Linear Regression : Introduction & Applications
Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.
Table of Content
Suppose you are given a task to solve a mystery. But instead of a magnifying glass and trench coat, you are armed with data and a curious mind. Your mission is to uncover the hidden connection between two variables, to reveal the story lurking beneath the surface. Welcome to the world of simple linear regression, where data is your clue, and the regression equation is your secret decoder. Welcome to my new blog on one of the most important and widely used Supervised technique i.e., Simple Linear Regression. Supervised learning techniques, a subset of machine learning, are widely used in various real-world applications across different domains. These techniques involve training models on labelled data to make predictions or classifications. Such a technique is Simple Linear Regression, it is a fundamental statistical method, that is used to understand the relationship between two continuous variables. It establishes a linear relationship between a dependent and an independent variable. The goal is to find the best-fitting straight line, also known as the regression line or the least-squares line, that describes the relationship between these two variables. We will dive deep into the concepts in this blog.
Simple Linear Regression is a supervised learning technique that is used to model the relationship between two continuous variables. It assumes that there is a linear relationship between a dependent variable Y and an independent variable X. Finding a linear equation that best fits the data is the goal.
The equation for the Simple linear Regression is
Y=aX+b
Where
Y is the dependent variable.
X is the independent variable.
a is the slope of the regression line, representing the change in Y for unit change in X.
b is the intercept, representing the value of Y when X is equal to 0.
The primary objective of simple linear regression is to estimate the values of a and b that minimize the sum of the squared differences between the observed values of Y and the values predicted by the linear equation. This is often referred to as the method of least squares.
If you are using R programming,then to fit the Simple Linear Regression model ‘lm()’ function is used i.e. model <- lm(Y ~ X, data = data) where X is an independent variable and Y is a dependent variable.
If you are using python,then scikit-learn library is used to fit the Simple Linear Regression model.
Now let us check some interesting facts of Simple Linear Regression from google trends:
It has been observed that nearly on an average 60 times per month this word has been searched all over the world.
Let us check the countries where the concept of simple linear regression is followed
As we can see 100 times per month sectors in Ethiopia have searched for Simple linear Regression. From these facts we can see how these supervised learning concepts are searched all over the world.
Earn yourself a promising career in Data Science by enrolling in Data Science Course in Bangalore offered by 360DigiTMG.
The steps followed in Simple Linear Regression are
OLS stands for Ordinary least squares, and it is a method used in simple linear regression to estimate the parameters of a linear relationship between two variables. It finds the line that best fits the data by minimizing the sum of the squared residuals.
1.Number of Independent Variables:
2.Model Complexity:
3.Equation:
Now let us solve one problem using Simple Linear Regression Supervised technique:
We are taking waist circumference and adipose tissue dataset for our analysis. Here we need to predict Adipose Tissue of the body based on Waist Circumference.
Let us see our dataset
Let us do the coding part step by step.
Step 1. Importing necessary libraries and importing the dataset.
This is the Dataframe we have after reading csv file.
Step 2. Checking dimensions of data and getting data description.
As we can see the data dimension is (109,2) that means 109 rows and 2 columns are present and parameters like mean, standard deviation, minimum, maximum etc are given for the dataset.
After preprocessing, we are going to do model building.
Step 3. Importing required libraries for Simple linear Regression and fitting our model.
Package statsmodel is imported and ols is one of the statistical models used to get coefficients of linear regression equation that creates a relationship between dependent and independent variables.
In our case Waist is the independent variable and AT is the dependent variable and ols method used to minimize the sum of square error between the observed and predicted values.
Now let us check the output and get the summary of the fitted model.
The equation we are getting from our model is AT=3.4589*Waist-215.9815.R-Squared and Adj. R-squared determines goodness of fit.67% variability is there in the data and the model is fitting well.
Step 4: Now check the predicted values and find out the error
The error of 32.76 is coming.
Step 5. Now let us try with some transformation.
We can see R-squared value is 0.675 which has increased from earlier model.
RMSE value is also bit lesser than 1st, model.
We can still see many observed points are not lying on the predicted line.
Become a Data Science Course expert with a single program. Go through 360DigiTMG's Data Science Course Course in Hyderabad. Enroll today!
Step 6:Let us do polynomial transformation
This model is good than the previous one, it is because here R-squared value is more. The equation of this model is AT=-7.8241+0.2289*Waist-(0.0010*Waist*Waist)
Data Science, AI and Data Engineering is a promising career option. Enroll in Data Science course in Chennai Program offered by 360DigiTMG to become a successful Career.
Step 7: Let us check the predicted values and calculate the RMSE.
As we can see most of the observed values are lying on the line, error is 32.24 which is less than other models we have seen.
Thus, the model giving the least error is having polynomial transformation. We can see the errors for both train and test data are closer to each other and less. Therefore, the model is the right fit model.
The following are some of the areas where Simple Linear Regression is used
A lot of areas are still there where linear relationships between variables persist.
From this blog we learnt what Simple Linear Regression model is, why it is widely used in Data Science, its application in the real world and a code example with Waist circumference Adipose Tissue dataset. In conclusion, simple linear regression offers unique insights into the linear relationship between two variables, making it a valuable tool for understanding and predicting outcomes in various fields. Simple linear regression will continue to be used in the future, alongside more advanced statistical techniques, for various purposes in data analysis and decision-making, its simplicity, interpretability, and historical significance ensure that it will continue to be a valuable and relevant statistical technique in the future. Simple linear regression can be integrated as a feature or as part of a larger predictive modelling process in machine learning and artificial intelligence applications. That is what makes it one of the most popular supervised models in the field of Data Science. Thanks for having patience to read my blog, if you genuinely liked this blog, feel free to give us the feedback in the comment section.
ECIL, Jaipur, Pune, Gurgaon, Salem, Surat, Agra, Ahmedabad, Amritsar, Anand, Anantapur, Andhra Pradesh, Anna Nagar, Aurangabad, Bhilai, Bhopal, Bhubaneswar, Borivali, Calicut, Cochin, Chengalpattu , Dehradun, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Guduvanchery, Gwalior, Hebbal, Hoodi , Indore, Jabalpur, Jaipur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Kanpur, Khammam, Kochi, Kolhapur, Kolkata, Kothrud, Ludhiana, Madurai, Mangalore, Meerut, Mohali, Moradabad, Pimpri, Pondicherry, Porur, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thoraipakkam , Tiruchirappalli, Tirunelveli, Trichur, Trichy, Udaipur, Vijayawada, Vizag, Warangal, Chennai, Coimbatore, Delhi, Dilsukhnagar, Hyderabad, Kalyan, Nagpur, Noida, Thane, Thiruvananthapuram, Uppal, Kompally, Bangalore, Chandigarh, Chromepet, Faridabad, Guntur, Guwahati, Kharadi, Lucknow, Mumbai, Mysore, Nashik, Navi Mumbai, Patna, Pune, Raipur, Vadodara, Varanasi, Yelahanka
360DigiTMG - Data Analytics, Data Science Course Training in Chennai
1st Floor, Santi Ram Centre, Tirumurthy Nagar, Opposite to Indian Oil Bhavan, Nungambakkam, Chennai - 600006
1800-212-654-321
Didn’t receive OTP? Resend
Let's Connect! Please share your details here