CatBoost is an algorithm for gradient boosting on decision trees,developed by Yandex researchers and engineers.This is the first Russian machine learning technology that’s an open source.
It is widely used within the company for ranking tasks,forecasting and making recommendations.It is universal and can be applied across a wide range of areas and to a variety of problems.
We are required to convert string categories in the numerical format using label encoding or one hot encoding to develop a model. CatBoost can use categorical features directly and it can be used for both regression and classification problems.CatBoost name comes from two words “Category” and “Boosting”.
If you don’t pass any anything in cat_features argument, it will treat all the columns as numerical variables.
We have to specify the variables in cat_features to make the algorithm treat it as categorical.If features are string datatype and not defining in cat_features, algorithm will throw an error.
Boost comes from gradient boosting machine learning algorithm.
Prediction rate for CatBoost,XGBoost and LightGBM:
According to the Yandex benchmark, catboost training time can take up longer than other GBDT implementations(XGBoost,LightGBM) , however prediction time is 13–16 times faster than the other libraries.
Log-Loss comparison for different algorithms in variety of the problems:
Installation of CatBoost:
pip install catboost
pip3 install catboost
Please find full code here.