- It is most popular among Kaggle competitors.
- It is optimized distributed gradient boosting library.
- It implements gradient descent technique to reduce the error in the tree.
- It provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way.
- XGboost execution speed is higher and gives us more accurate results.
XGBoost Features :
We should be known about boosting before deep dive into XGBoost.
How Boosting work?
Boosting is a sequential technique which works on the principle of an ensemble.It combines a set of weak learners and delivers improved prediction accuracy.The outcomes predicted correctly are given a lower weight and the ones miss-classified are weighted higher.
Let’s understand boosting with a simple illustration.
Box-1: Model or weak learner misclassified two (+).
Box-2:This classifier gives more weight to two (+) misclassification or reduces the error of previous classifier but misclassified two(-).
Box-3: This classifier gives more weight to two (-) misclassification or weak learner reduces the error of previous classifier but misclassified two(+) and one(-).
Box-4: It is a combination of all weak learners Box1,Box2,Box3 and did a good job to classify the all(+ or -).
How XGBoost works?
XGBoost is the technique or algorithm that comes under boosting. It generates the weak learners sequentially and reduce the error of previous classifier and combines all weak learner to make strong learner that produces the good accuracy.
The training proceeds iteratively, adding new trees that predict the residuals or errors of prior trees that are then combined with previous trees to make the final prediction.
It’s called gradient boosting because it uses a gradient descent algorithm to minimise the loss when adding new models.
This approach supports both regression and classification predictive modeling problems.
Ques- What is the difference between GBM and XGboost?
Both xgboost and gbm follows the principle of gradient boosting. There are however, the difference in modeling details. Specifically,xgboost used a more regularised model formalisation to control over-fitting, which gives it better performance.
Ques- What is the difference between AdaBoost and XGBoost?
In AdaBoost ,shortcomings are identified by high-weight data points and in XGboost ,shortcomings are identified by gradients.
Ques-What is the difference between XGBoost ,LightGBM, CatBoost algorithm?
Generally ,in terms of accuracy, XGBoost is better than lightgbm and catboost, however, in terms of speed LightGBM is better than xgboost and catboost. CatBoost may be the underperformer in comparison of XGboost and lightgbm.
Installation and Implementation of XGBoost-
Installation-We may download and install it by running:
pip3 install xgboost
Ensure that you are downloaded one of the following wheel file:
Please find the full implementation here.
Parameters and the ways to tune it:
1-max_depth: Maximum tree depth for base learners.Default value is 3.
2-learning-rate:Boosting learning rate. Default value is .1 .
3-n_estimators:Number of boosted tree to fit.
4-n_jobs:Number of parallel threads used to run xgboost.
5-reg_alpha:L1 regularization term on weights.
6-reg_lambda:L2 regularization term on weights.
XGBoost Parameters Tuning :
We can use GridSearchCV library to tune the parameters like below.