Adaptive boosting: AdaBoost, Basic Algorithm

The algorithm here is rather straightforward. Assume that our weak classifier is a decision tree and we consider a binary set of outputs with \( y_i \in \{-1,1\} \) and \( i=0,1,2,\dots,n-1 \) as our set of observations. Our design matrix is given in terms of the feature/predictor vectors \( \boldsymbol{X}=[\boldsymbol{x}_0\boldsymbol{x}_1\dots\boldsymbol{x}_{p-1}] \). Finally, we define also a classifier determined by our data via a function \( G(x) \). This function tells us how well we are able to classify our outputs/targets \( \boldsymbol{y} \).

We have already defined the misclassification error \( \mathrm{err} \) as

$$ \mathrm{err}=\frac{1}{n}\sum_{i=0}^{n-1}I(y_i\ne G(x_i)), $$

where the function \( I() \) is one if we misclassify and zero if we classify correctly.