A Hands-on Guide To Create Explainable Gradient Boosting Classification models using Bayesian Hyperparameter Optimization.
Boosted decision tree algorithms, such as XGBoost, CatBoost, and LightBoost are popular methods for the classification task. Learn how to split the data, optimize hyperparameters, prevent overtraining, select the best-performing model, and create explainable results.
This blog is written in a series where the first part explains the general concepts of gradient boosting techniques such as XGBoost, CatBoost, and LightBoost, together with the process of tuning hyperparameters, and details about the HGBoost library. In this part 2, I will demonstrate in more detail: 1. how to train a gradient boosting classification model with optimized hyperparameters using Bayesian optimization, 2. how to select the best performing model (and is not overtrained), 3. how to create explainable results by visually explaining the optimized hyperparameter space together with the model performance accuracy.
A brief introduction.
Gradient boosting algorithms such as Extreme Gradient Boosting (XGboost), Light Gradient Boosting (Lightboost), and CatBoost are powerful ensemble machine learning algorithms for predictive modeling that can be applied on tabular and continuous data, and for both classification and regression tasks [1,2,3]. Here I will focus on the classification task. If you need more background or are not entirely familiar with some of the concepts, I recommend reading A Guide to Find the Best Boosting Model using Bayesian Hyperparameter Tuning but without Overfitting. Before we go to the hands-on example, I will first briefly discuss the HGBoost library  because we will use this single library to do all the tasks.