Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

The Squared-Error again! Steepest Descent

We start again with our cost function {\cal C}(\boldsymbol{y}m\boldsymbol{f})=\sum_{i=0}^{n-1}{\cal L}(y_i, f(x_i)) where we want to minimize This means that for every iteration, we need to optimize

(\hat{\boldsymbol{f}}) = \mathrm{argmin}_{\boldsymbol{f}}\hspace{0.1cm} \sum_{i=0}^{n-1}(y_i-f(x_i))^2.

We define a real function h_m(x) that defines our final function f_M(x) as

f_M(x) = \sum_{m=0}^M h_m(x).

In the steepest decent approach we approximate h_m(x) = -\rho_m g_m(x) , where \rho_m is a scalar and g_m(x) the gradient defined as

g_m(x_i) = \left[ \frac{\partial {\cal L}(y_i, f(x_i))}{\partial f(x_i)}\right]_{f(x_i)=f_{m-1}(x_i)}.

With the new gradient we can update f_m(x) = f_{m-1}(x) -\rho_m g_m(x) . Using the above squared-error function we see that the gradient is g_m(x_i) = -2(y_i-f(x_i)) .

Choosing f_0(x)=0 we obtain g_m(x) = -2y_i and inserting this into the minimization problem for the cost function we have

(\rho_1) = \mathrm{argmin}_{\rho}\hspace{0.1cm} \sum_{i=0}^{n-1}(y_i+2\rho y_i)^2.