Iterative Fitting, Regression and Squared-error Cost Function

The way we proceed is as follows (here we specialize to the squared-error cost function)

  1. Establish a cost function, here \( {\cal C}(\boldsymbol{y},\boldsymbol{f}) = \frac{1}{n} \sum_{i=0}^{n-1}(y_i-f_M(x_i))^2 \) with \( f_M(x) = \sum_{i=1}^M \beta_m b(x;\gamma_m) \).
  2. Initialize with a guess \( f_0(x) \). It could be one or even zero or some random numbers.
  3. For \( m=1:M \)
    1. minimize \( \sum_{i=0}^{n-1}(y_i-f_{m-1}(x_i)-\beta b(x;\gamma))^2 \) wrt \( \gamma \) and \( \beta \)
    2. This gives the optimal values \( \beta_m \) and \( \gamma_m \)
    3. Determine then the new values \( f_m(x)=f_{m-1}(x) +\beta_m b(x;\gamma_m) \)

We could use any of the algorithms we have discussed till now. If we use trees, \( \gamma \) parameterizes the split variables and split points at the internal nodes, and the predictions at the terminal nodes.