Week 47: From Decision Trees to Ensemble Methods, Random Forests and Boosting Methods

Loading [MathJax]/extensions/TeX/boldsymbol.js

Adaptive Boosting, AdaBoost

In our iterative procedure we define thus

$f_m(x) = f_{m-1}(x)+\beta_mG_m(x).$

The simplest possible cost function which leads (also simple from a computational point of view) to the AdaBoost algorithm is the exponential cost/loss function defined as

$C(\boldsymbol{y},\boldsymbol{f}) = \sum_{i=0}^{n-1}\exp{(-y_i(f_{m-1}(x_i)+\beta G(x_i))}.$

We optimize $\beta$ and $G$ for each value of $m=1:M$ as we did in the regression case. This is normally done in two steps. Let us however first rewrite the cost function as

$C(\boldsymbol{y},\boldsymbol{f}) = \sum_{i=0}^{n-1}w_i^{m}\exp{(-y_i\beta G(x_i))},$

where we have defined $w_i^m= \exp{(-y_if_{m-1}(x_i))}$ .