The problem with machine learning models is that we don’t know how is model performing until we test its performance on an independent data set, the data set which was not used for training the machine learning model.
We need some kind of assurance that our model is low on bias and variance.
Cross Validation comes to the rescue here and helps us to estimate the performance of the model. One type of cross validation is the K-Fold Cross Validation.
Cross Validation is a technique which reserve a subset of the dataset for testing the model before finalising it and rest data is using for training the model.
Below are the steps involved in cross validation :
There are multiple techniques involved in cross validation and K-Fold cross validation is one of it . It is mostly used and famous in the industries and hackathons.
K-Fold Cross Validation-
K-Fold CV split the dataset into a K number of folds where each fold is used as a testing set at some point.
Here dataset is splitting into 3-fold. In the first iteration , first fold is using for test and rest are using for training the model. In the second iteration , second fold is being used for test and rest are being used for training the model. This process is repeated until each fold of the 3 folds have been used as the testing set.
Algorithm steps of K-Fold cross validation-
- Split the entire data randomly into k folds and the value of k shouldn’t be too small or too high, ideally we choose 5 to 10 depending on the data size.
- Then fit or train the model using the K -1 folds and validate the model using the remaining Kth fold.
- Repeat this process until every K-fold serve as the test set.
- Then take the average of your recorded scores. It will be the performance metric for the model.
We can use one of the below method to implement the K-Fold cross validation.
cross_val_score(model_name, X, y, cv=10)
It will return the list of r2 scores.
cross_val_predict(model_name, X, y, cv=10)
It will return the list of predictions