The Squared-Error again! Steepest Descent

We start again with our cost function \( {\cal C}(\boldsymbol{y}m\boldsymbol{f})=\sum_{i=0}^{n-1}{\cal L}(y_i, f(x_i)) \) where we want to minimize This means that for every iteration, we need to optimize

$$ (\hat{\boldsymbol{f}}) = \mathrm{argmin}_{\boldsymbol{f}}\hspace{0.1cm} \sum_{i=0}^{n-1}(y_i-f(x_i))^2. $$

We define a real function \( h_m(x) \) that defines our final function \( f_M(x) \) as

$$ f_M(x) = \sum_{m=0}^M h_m(x). $$

In the steepest decent approach we approximate \( h_m(x) = -\rho_m g_m(x) \), where \( \rho_m \) is a scalar and \( g_m(x) \) the gradient defined as

$$ g_m(x_i) = \left[ \frac{\partial {\cal L}(y_i, f(x_i))}{\partial f(x_i)}\right]_{f(x_i)=f_{m-1}(x_i)}. $$

With the new gradient we can update \( f_m(x) = f_{m-1}(x) -\rho_m g_m(x) \). Using the above squared-error function we see that the gradient is \( g_m(x_i) = -2(y_i-f(x_i)) \).

Choosing \( f_0(x)=0 \) we obtain \( g_m(x) = -2y_i \) and inserting this into the minimization problem for the cost function we have

$$ (\rho_1) = \mathrm{argmin}_{\rho}\hspace{0.1cm} \sum_{i=0}^{n-1}(y_i+2\rho y_i)^2. $$