CatBoost is an algorithm for gradient boosting on decision trees,developed by Yandex researchers and engineers.This is the first Russian machine learning technology that’s an open source.
It is widely used within the company for ranking tasks,forecasting and making recommendations.It is universal and can be applied across a wide range of areas and to a variety of problems.

We are required to convert string categories in the numerical format using label encoding or one hot encoding to develop a model. CatBoost can use categorical features directly and it can be used for both regression and classification problems.CatBoost name comes from two words “Category” and “Boosting”.

If you don’t pass any anything in cat_features argument, it will treat all the columns as numerical variables.
We have to specify the variables in cat_features to make the algorithm treat it as categorical.If features are string datatype and not defining in cat_features, algorithm will throw an error.

Boost comes from gradient boosting machine learning algorithm.

CatBoost Features:

CatBoost Features

Prediction rate for CatBoost,XGBoost and LightGBM:

According to the Yandex benchmark, catboost training time can take up longer than other GBDT implementations(XGBoost,LightGBM) , however prediction time is 13–16 times faster than the other libraries.

Prediction Rate comparison

Log-Loss comparison for different algorithms in variety of the problems:

Catboost log loss is lowest

Installation of CatBoost:

pip install catboost
pip3 install catboost


CatBoost implementation

Please find full code here.


Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Additional settings
Formula color
Text color
Type math using LaTeX
Nothing to preview