Adaptive Boosting, AdaBoost

In our iterative procedure we define thus

$$ f_m(x) = f_{m-1}(x)+\beta_mG_m(x). $$

The simplest possible cost function which leads (also simple from a computational point of view) to the AdaBoost algorithm is the exponential cost/loss function defined as

$$ C(\boldsymbol{y},\boldsymbol{f}) = \sum_{i=0}^{n-1}\exp{(-y_i(f_{m-1}(x_i)+\beta G(x_i))}. $$

We optimize \( \beta \) and \( G \) for each value of \( m=1:M \) as we did in the regression case. This is normally done in two steps. Let us however first rewrite the cost function as

$$ C(\boldsymbol{y},\boldsymbol{f}) = \sum_{i=0}^{n-1}w_i^{m}\exp{(-y_i\beta G(x_i))}, $$

where we have defined \( w_i^m= \exp{(-y_if_{m-1}(x_i))} \).