First, for any \beta > 0 , we optimize G by setting
G_m(x) = \mathrm{sign} \sum_{i=0}^{n-1} w_i^m I(y_i \ne G_(x_i)),which is the classifier that minimizes the weighted error rate in predicting y .
We can do this by rewriting
\exp{-(\beta)}\sum_{y_i=G(x_i)}w_i^m+\exp{(\beta)}\sum_{y_i\ne G(x_i)}w_i^m,which can be rewritten as
(\exp{(\beta)}-\exp{-(\beta)})\sum_{i=0}^{n-1}w_i^mI(y_i\ne G(x_i))+\exp{(-\beta)}\sum_{i=0}^{n-1}w_i^m=0,which leads to
\beta_m = \frac{1}{2}\log{\frac{1-\mathrm{\overline{err}}}{\mathrm{\overline{err}}}},where we have redefined the error as
\mathrm{\overline{err}}_m=\frac{1}{n}\frac{\sum_{i=0}^{n-1}w_i^mI(y_i\ne G(x_i)}{\sum_{i=0}^{n-1}w_i^m},which leads to an update of
f_m(x) = f_{m-1}(x) +\beta_m G_m(x).This leads to the new weights
w_i^{m+1} = w_i^m \exp{(-y_i\beta_m G_m(x_i))}