Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Gradient Boosting, algorithm

Steepest descent is however not much used, since it only optimizes f at a fixed set of n points, so we do not learn a function that can generalize. However, we can modify the algorithm by fitting a weak learner to approximate the negative gradient signal.

Suppose we have a cost function C(f)=\sum_{i=0}^{n-1}L(y_i, f(x_i)) where y_i is our target and f(x_i) the function which is meant to model y_i . The above cost function could be our standard squared-error function

C(\boldsymbol{y},\boldsymbol{f})=\sum_{i=0}^{n-1}(y_i-f(x_i))^2.

The way we proceed in an iterative fashion is to

  1. Initialize our estimate f_0(x) .
  2. For m=1:M , we
    1. compute the negative gradient vector \boldsymbol{u}_m = -\partial C(\boldsymbol{y},\boldsymbol{f})/\partial \boldsymbol{f}(x) at f(x) = f_{m-1}(x) ;
    2. fit the so-called base-learner to the negative gradient h_m(u_m,x) ;
    3. update the estimate f_m(x) = f_{m-1}(x)+h_m(u_m,x) ;
  3. The final estimate is then f_M(x) = \sum_{m=1}^M h_m(u_m,x) .