All the models involves significant measures of error. Error can be reducible and non-reducible.The reducible errors which are the bias and variance can be made use off.
Gaining a proper understanding of bias, variance, underfitting, overfitting and tradeoff would help us to develop the most accurate model.
So let’s start discussing about these terms and see how they make impacted.
What is Bias?
- Bias is the difference between the average prediction of our model and the correct value.
- Bias measures how far off in general these models’ predictions are from the correct value.
- High bias model, termed as underfitting and perform not good in the training or test dataset.
What is Variance?
- Variance is the variability of model prediction for a given data point or a value which tells us spread of our data.
- Models with high variance perform pretty well on training data but has high error rates on test data.
- High variance model leads to overfitting.
Define Error Mathematically:
What is Underfitting?
- Underfitting happens , when a model unable to capture the underlying pattern of the data.
- Underfitted Models usually have high bias and low variance.
- It happens when we have very less amount of data to build an accurate model or when we try to build a linear model with a nonlinear data
What is Overfitting?
- Overfitting happens when our model captures the noise along with the underlying pattern in data.
- It happens when we train our model a lot over noisy dataset.
- Overfitted models have low bias and high variance.
Bias and Variance using bulls-eye diagram:
As we move away from the bulls-eye, our predictions become get worse and worse.
Benefit of Bias Variance TradeOff:
If our model is too simple and has very few parameters then it may have high bias and low variance and we call it underfitted model. Model does not get trained properly and won’t be used in production environment.
On the other hand if our model has large number of parameters then it’s going to have high variance and low bias and called it overfitted model.Model works very well for the training dataset but failed to get correct output on test dataset.
So we need to find the right/good balance without overfitting and underfitting the data.
Bias Variance tradeoff:
We need to find a good balance between bias and variance such that it minimizes the total error.
TotalError = Bias^2 + Variance + Irreducible
Model should be trained such like that we can achieve the optimal balance between bias and variance and avoid overfitting,underfitting.