What is LightGBM (Light Gradient Boosting) + Example Python Code

Nikol Holicka
4 min readFeb 25, 2020

--

This blogpost should serve you an introduction into what it LightGBM, how is it different from other decision tree algorithms, what novel features it brings and a code example of running it. This will be accompanied by a couple of beautiful paintings by Emil Carlsen, who very much enjoyed painting trees rather than planting them in Python.

The lightness! (Emil Carlsen, Opaline Sea)

What is Gradient Boosting?

Gradient Boosting is a method during which weak learners and continuously improve into strong learners. Unlike Random Forest in which all trees are built independently, Boosted Trees are likely to reach higher accuracy due to the continuous learning.

One of the most popular ones is XGBoost. It is known for its popularity on Kaggle, speed and reliable performance for multiclass classification projects. XGBoost is only one type of several Gradient Boosting Decision Trees (GBDT).

The problem with Gradient Boosting Decision Trees

The trees in GBDTs are trained in sequence by evaluating the residual errors of each iteration and improving the following one. GBDTs need to compute the information gain across all instances and consider all possible split points while doing so. This is very time-consuming! As a result, with the emergence of big data, they are facing challenges especially due to insufficient speed.

I have encountered this problem when running a grid search for XGBoost model. Training the model took ages! So, when I found out about LightGBM, I was intrigued.

What is Light Gradient Boosting?

LightGBM is one of the more novel types of GBDT. It was developed by a team of researchers at Microsoft in 2016. To put it simply, Light GBM introduces two novel features that are not present in XGBoost. The purpose of them is to help the algorithm with large number of variables and data instances.

What are those novel features?

1.Gradient-based One-Side Sampling

This is a sampling method that reduces the amount of data that a decision tree uses for learning.

This sampling method considers the size of the gradient (=training error). It keeps the instances where the error is still large. The instances with a small error are randomly sampled, before being introduced to the tree. As a result, each tree has to crunch through less data!

2. Exclusive Feature Bundling

As you can guess from the name, this method reduces the number of features or variables. Very often, large number of features of your dataset are sparse (mostly zeros), especially if you work with many categorical variables. Many of these are mutually exclusive at the same time. This method bundles these very similar features together into a single feature.

Both features result in what is the general advantage of LightGMB over XGBoost and other GBDTs — it is over less computationally exhaustive and thus faster! You can find more specific information about its structure in this paper and in the official documentation.

Those are some tall straight trees — maybe you want to decrease max_depth and increase num_leaves? (Emil Carlsen, Trees in Forest)

The benefits can be:

  • Faster training speed
  • Lower memory usage
  • Higher accuracy
  • GPU and parallel learning
  • Easier use of large-scale data

Some parameters to tune

If you used GBDTs before, you will be familiar with most of there. Here is a list of parameters you can tune or feed into grid-search to find your optimal combination.

  • Max_depth — to limit complexity and prevent overfitting
  • Num_leaves — to limit complexity and prevent overfitting, should be smaller than 2^(max_depth)
  • Bagging_fraction — specifies the fraction of data to be used for each iteration, will increase speed
  • Learning_rate — increases accuracy if set to a small value
  • Num_iterations — number of boosting interaction, default is 100, increase for higher accuracy
  • Device — options: ‘gpu’ or ‘cpu’, choose ‘gpu’ (graphical processing unit) for faster computation

Using LightGBM

Install the package

#install and import the package!pip install lightgbmimport lightgbm as lgb

This is example of a pipeline using MinMax scaler, PCA compression, gridsearch and, of course, Light GMB!

#instantiate the classifierLGBM_pipeline = lgb.LGBMClassifier()
#Build the pipelinelgb_baseline_grid = Pipeline([(‘scl’, MinMaxScaler()),(‘pca’, PCA(n_components=45)),(‘clf’, LGBM_pipeline)])
#Set grid search parametersparam_grid_lgb = {‘learning_rate’: [0.1,0.2], #=eta, smaller number makes model more robust by shrinking weights on each step‘max_depth’: [20,40,80], #max depth of a three, controls overfitting‘min_child_weight’: [40], #minimum sum of weights of all observations required in a child, higher values reduce over-fitting‘subsample’: [0.6], #the fraction of observations to be randomly samples for each tree.‘n_estimators’: [50,100],}grid_lgb = GridSearchCV(LGBM_pipeline, param_grid_lgb, scoring=’accuracy’, cv=None, n_jobs=1)
#train the modelgrid_lgb.fit(X_train, y_train)best_parameters = grid_lgb.best_params_print(‘Grid Search found the following optimal parameters: ‘)for param_name in sorted(best_parameters.keys()):print(‘%s: %r’ % (param_name, best_parameters[param_name]))training_preds = grid_lgb.predict(X_train)test_preds = grid_lgb.predict(X_test)training_accuracy = accuracy_score(y_train, training_preds)test_accuracy = accuracy_score(y_test, test_preds)
#Print the training and validation accuracy of the optimal grid search resultprint(‘’)print(‘Training Accuracy: {:.4}%’.format(training_accuracy * 100))print(‘Validation accuracy: {:.4}%’.format(test_accuracy * 100))

What were my results?

In my case, I have noticed a high increase in speed when training my model compared to XGBoost. I have increased the accuracy of my model by 1%. That is not a large change, but the increase in the speed of the training could make a big difference if you are training your model on a large dataset with many parameters in your grid-search!

A gorgeous tree (Afternoon Landscape, Emil Carlsen)

--

--