Week 47: From Decision Trees to Ensemble Methods, Random Forests and Boosting Methods

Iterative Fitting, Classification and AdaBoost

Let us consider a binary classification problem with two outcomes $ y_i \in \{-1,1\} $ and $ i=0,1,2,\dots,n-1 $ as our set of observations. We define a classification function $ G(x) $ which produces a prediction taking one or the other of the two values $ \{-1,1\} $.

The error rate of the training sample is then

$$ \mathrm{\overline{err}}=\frac{1}{n} \sum_{i=0}^{n-1} I(y_i\ne G(x_i)). $$

The iterative procedure starts with defining a weak classifier whose error rate is barely better than random guessing. The iterative procedure in boosting is to sequentially apply a weak classification algorithm to repeatedly modified versions of the data producing a sequence of weak classifiers $ G_m(x) $.

Here we will express our function $ f(x) $ in terms of $ G(x) $. That is

$$ f_M(x) = \sum_{i=1}^M \beta_m b(x;\gamma_m), $$

will be a function of

$$ G_M(x) = \mathrm{sign} \sum_{i=1}^M \alpha_m G_m(x). $$