When we say that “model is ready”, there should be some techniques or methods to evaluate the readiness of it and to deploy in the production.We are going to learn below techniques to find out the ways to evaluate the machine learning model.
- Confusion Matrix
- F1 Score
- ROC (and AUC)
Confusion Matrix is a table that helps to find out the model efficiency, how well the model is. Below are the terms using in the confusion matrix.
We are discussing each terms using in confusion matrix through an example.
Lets say we have a dataset of 100 records and model is predicting how many people having more than 5000$ in their accounts.
Now model capability is calculated using actual and predicted value and statistics are given below:
1-True Positive(TP): Actual is Yes and Prediction value is also Yes.
Here TP is 50.
True Positive Rate-> TP/(Total Actual Yes)=> 50/55
TPR is also known as Sensitivity or Recall.
2-True Negative: Actual is No and prediction is also No.
True Negative Rate=>TN/(Total Actual No)=> 30/45
TNR also known as Specificity.
3-False Positive(FP)- Actual is No but prediction is yes.
Here FP is 15.
It is also called Type-1 error.
False Positive Rate- FP/(Total Actual No)=>15/45
Example: False Fire Signal.
4-False Negative(FN)-Actual is Yes but prediction is No.
Here FN is 5.
It is also called Type-2 error.
False Negative Rate-FN/(Total Actual yes)=> 5/55
Example:Pregnancy test shows women is pregnant but actually she is not.
Now updating the confusion matrix with the terms:
Accuracy=(Number of Samples predicted correctly)/(Total number of Samples.)
Accuracy=(TN+TP)/N=(30+50)/100=0.8 or 80% accuracy.
Why Other Performance Measures if Accuracy is there?
Taking an example to understand the need of other measures except accuracy.
Let’s say there is a financial institute(FI) has below statistics .
The most important priority for any FI to figure out the Fraudulent Transactions from the dataset.
But when we develop the machine learning model, it may be biased to higher classified data and start to predict non fraudulent transactions always.
So far we know only accuracy measure and finding the model capability:
Can any financial institute adopt the model which is predicting 99% of accuracy?
Answer is big “NO”, FI requires the model that can predict the fraudulent transactions more accurately. Here FP is 0,means model is not predicting the fraudulent transactions. It means model is not doing well for this company irrespective of high accuracy.
Now there is a need of other measures as well to find the capability of the model and can not trust on accuracy only.
We can conclude that Accuracy is not sufficient measure to evaluate the model.