To better understand what happens, let us develop the steps for the iterative fitting using the above squared error function.
For simplicity we assume also that our functions \( b(x;\gamma)=1+\gamma x \).
This means that for every iteration \( m \), we need to optimize
$$ (\beta_m,\gamma_m) = \mathrm{argmin}_{\beta,\lambda}\hspace{0.1cm} \sum_{i=0}^{n-1}(y_i-f_{m-1}(x_i)-\beta b(x;\gamma))^2=\sum_{i=0}^{n-1}(y_i-f_{m-1}(x_i)-\beta(1+\gamma x_i))^2. $$We start our iteration by simply setting \( f_0(x)=0 \). Taking the derivatives with respect to \( \beta \) and \( \gamma \) we obtain
$$ \frac{\partial {\cal C}}{\partial \beta} = -2\sum_{i}(1+\gamma x_i)(y_i-\beta(1+\gamma x_i))=0, $$and
$$ \frac{\partial {\cal C}}{\partial \gamma} =-2\sum_{i}\beta x_i(y_i-\beta(1+\gamma x_i))=0. $$We can then rewrite these equations as (defining \( \boldsymbol{w}=\boldsymbol{e}+\gamma \boldsymbol{x}) \) with \( \boldsymbol{e} \) being the unit vector)
$$ \gamma \boldsymbol{w}^T(\boldsymbol{y}-\beta\gamma \boldsymbol{w})=0, $$which gives us \( \beta = \boldsymbol{w}^T\boldsymbol{y}/(\boldsymbol{w}^T\boldsymbol{w}) \). Similarly we have
$$ \beta\gamma \boldsymbol{x}^T(\boldsymbol{y}-\beta(1+\gamma \boldsymbol{x}))=0, $$which leads to \( \gamma =(\boldsymbol{x}^T\boldsymbol{y}-\beta\boldsymbol{x}^T\boldsymbol{e})/(\beta\boldsymbol{x}^T\boldsymbol{x}) \). Inserting for \( \beta \) gives us an equation for \( \gamma \). This is a non-linear equation in the unknown \( \gamma \) and has to be solved numerically.
The solution to these two equations gives us in turn \( \beta_1 \) and \( \gamma_1 \) leading to the new expression for \( f_1(x) \) as \( f_1(x) = \beta_1(1+\gamma_1x) \). Doing this \( M \) times results in our final estimate for the function \( f \).