To better understand what happens, let us develop the steps for the iterative fitting using the above squared error function.
For simplicity we assume also that our functions b(x;\gamma)=1+\gamma x .
This means that for every iteration m , we need to optimize
(\beta_m,\gamma_m) = \mathrm{argmin}_{\beta,\lambda}\hspace{0.1cm} \sum_{i=0}^{n-1}(y_i-f_{m-1}(x_i)-\beta b(x;\gamma))^2=\sum_{i=0}^{n-1}(y_i-f_{m-1}(x_i)-\beta(1+\gamma x_i))^2.We start our iteration by simply setting f_0(x)=0 . Taking the derivatives with respect to \beta and \gamma we obtain
\frac{\partial {\cal C}}{\partial \beta} = -2\sum_{i}(1+\gamma x_i)(y_i-\beta(1+\gamma x_i))=0,and
\frac{\partial {\cal C}}{\partial \gamma} =-2\sum_{i}\beta x_i(y_i-\beta(1+\gamma x_i))=0.We can then rewrite these equations as (defining \boldsymbol{w}=\boldsymbol{e}+\gamma \boldsymbol{x}) with \boldsymbol{e} being the unit vector)
\gamma \boldsymbol{w}^T(\boldsymbol{y}-\beta\gamma \boldsymbol{w})=0,which gives us \beta = \boldsymbol{w}^T\boldsymbol{y}/(\boldsymbol{w}^T\boldsymbol{w}) . Similarly we have
\beta\gamma \boldsymbol{x}^T(\boldsymbol{y}-\beta(1+\gamma \boldsymbol{x}))=0,which leads to \gamma =(\boldsymbol{x}^T\boldsymbol{y}-\beta\boldsymbol{x}^T\boldsymbol{e})/(\beta\boldsymbol{x}^T\boldsymbol{x}) . Inserting for \beta gives us an equation for \gamma . This is a non-linear equation in the unknown \gamma and has to be solved numerically.
The solution to these two equations gives us in turn \beta_1 and \gamma_1 leading to the new expression for f_1(x) as f_1(x) = \beta_1(1+\gamma_1x) . Doing this M times results in our final estimate for the function f .