Login
Congrats in choosing to up-skill for your bright career! Please share correct details.
Home / Blog / Machine Learning / Overfitting and Underfitting
Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.
Table of Content
An overfitting scenario is when a model performs very well on training data but poorly on test data. The noise that the machine learning model learns along with the patterns will have a detrimental impact on the model's performance on test data. When using nonlinear models with a nonlinear decision boundary, the overfitting issue typically arises. In SVM, a decision boundary could be a hyperplane or a linearly separable line.
The pattern is nonlinear in this instance, as is evident. Results from the model cannot be generalised to new data.
Non-linear models, such as decision trees, may frequently overfit the decision boundary that they produce. A high variance problem is another name for this. If we use target shooting as an example, if there is substantial volatility, it would be comparable to having an unstable target. Overfitting results in a big Val/test error and a relatively tiny train error.
The ideal preventative approach to address overfitting is cross-validation. The entire dataset is split into k sets, each of roughly similar size. The algorithm will train the data on the k-1 sets using the first set as test data. The calculation of the test error.
In the second iteration, the residual k-1 sets are used as train data for calculating test error, and the second set is chosen as the test set.
Once all k sets have been processed, the procedure repeats.
The method for K=5 is illustrated in the image below.
In any case, we may modify folds to find the ideal k to address overfitting.
We can tune folds, either way, to select the best k to solve overfitting.
This will not always work if the model is not so complex. We can try a less powerful model with fewer parameters. Data augmentation will help to solve this sometimes.
Overfitting may be avoided by doing feature engineering and feature selection.
We add additional characteristics to the model in an effort to increase its accuracy, but doing so may overcomplicate it and cause overfitting.
To make the model as basic as feasible, regularisation maintains the parameter values as little as possible. When compared to initial simple models, strong regularisation would perform better. In order to prevent the model from overlearning the patterns in the data, the regularisation approach helps to decrease the parameters. The tuning parameter is what aids in getting the proper fit. distinct machine learning algorithms have distinct hyperparameters. For instance, neural network dropout, pruning strategy, decision tree ccp_alpha, maximum tree depth, regression using L1/L2 norms, etc.
Please click the following link to learn about pruning techniques.
https://360digitmg.com/decision-trees-and-its-algorithms
Like boosting, bagging Random forest can be used to solve variance problems.
Underfitting occurs when a model does not learn the patterns on training data well enough to generalise to unknown data. The link between input and output variables is inaccurately learned by the model. When the model is overly simplistic or requires additional training time, input characteristics, etc., this happens. Both train and Val/test error are significant.
The model generates forecasts that are accurate but initially off. When compared to overfitting, underfitting is not a major problem because it can be readily fixed. The algorithms' principles can be applied to smaller data sets, which can lead to inaccurate predictions.
By including more inputs to our data, we may make the model more complicated and better reflect the relationship between the variables. Building polynomial models starting with 2 degrees, 3 degrees, etc. will allow us to try it out.
Underfitting can be fixed by adding inputs in a sequential manner. For instance, increasing the number of hidden neurons in a neural network or the number of trees in a random forest would increase complexity to the model and improve training outcomes.
We are stopping the training soon by not allowing the algorithm to learn the patterns completely. It is very important to maintain the right steps while training otherwise it may run into overfitting. We can increase the number of epochs in neural networks.
By imposing a penalty on the input parameters with the greater coefficients, regularisation aids in lowering the variance associated with a model. A model's noise and outliers may be reduced using a variety of methods, including L1/L2 regularisation and other techniques. The model will not be able to recognise the prevailing trend if the data dimensions are too stable, which results in underfitting. Reducing the regularisation level improves complexity and variance incorporated into the model, enabling effective model training.
When the model predicts with zero error it is the best fit scenario. From the below charts we can infer that the model initially fails to capture the relationship between x and y. Then we added features to improve the pattern learning. To reduce underfitting we keep on adding features that will eventually make your model more complex resulting in overfitting. Click here to learn Data Science Course in Hyderabad
The alternative possibility is that when learning time grows over time as a result of additional inputs, error on training data and test data will also likely decrease. The model will become overfitted if this persists and training the data takes more time.
So choosing the right set of features, the right amount of training, right regularisation penalty terms will help us in achieving the RIGHT fit or the best fit.
We try to find the ideal ratio of bias to variance for every model. This only makes sure that we record the key patterns in our model while disregarding the noise. A bias-variance tradeoff can be used to describe this. Our model's error is lowered and maintained as low as feasible with its assistance.
A model that has been optimised will be sensitive to the patterns in our data while also being able to generalise to new data. This should have a modest bias and variance to avoid overfitting and underfitting. Therefore, achieving minimal bias and low variance is our goal.
Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore
360DigiTMG - Data Science, IR 4.0, AI, Machine Learning Training in Malaysia
Level 16, 1 Sentral, Jalan Stesen Sentral 5, Kuala Lumpur Sentral, 50470 Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia
+60 19-383 1378
Didn’t receive OTP? Resend
Let's Connect! Please share your details here