Week 47: From Decision Trees to Ensemble Methods, Random Forests and Boosting Methods

Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods

As stated above and seen in many of the examples discussed here about a single decision tree, we often end up overfitting our training data. This normally means that we have a high variance. Can we reduce the variance of a statistical learning method?

This leads us to a set of different methods that can combine different machine learning algorithms or just use one of them to construct forests and jungles of trees, homogeneous ones or heterogenous ones. These methods are recognized by different names which we will try to explain here. These are

Voting classifiers
Bagging and Pasting
Random forests
Boosting methods, from adaptive to Extreme Gradient Boosting (XGBoost)

We discuss these methods here.