Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

The mean squared error and its derivative

We defined earlier a possible cost function using the mean squared error

C(\boldsymbol{\beta})=\frac{1}{n}\sum_{i=0}^{n-1}\left(y_i-\tilde{y}_i\right)^2=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{\tilde{y}}\right)^T\left(\boldsymbol{y}-\boldsymbol{\tilde{y}}\right)\right\},

or using the design/feature matrix \boldsymbol{X} we have the more compact matrix-vector

C(\boldsymbol{\beta})=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right)^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right)\right\}.

We note that the design matrix \boldsymbol{X} does not depend on the unknown parameters defined by the vector \boldsymbol{\beta} . We are now interested in minimizing the cost function with respect to the unknown parameters \boldsymbol{\beta} .

The mean squared error is a scalar and if we use the results from example three above, we can define a new vector

\boldsymbol{w}=\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta},

which depends on \boldsymbol{\beta} . We rewrite the cost function as

C(\boldsymbol{\beta})=\frac{1}{n}\boldsymbol{w}^T\boldsymbol{w},

with partial derivative

\frac{\partial C(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}=\frac{2}{n}\boldsymbol{w}^T\frac{\partial \boldsymbol{w}}{\partial \boldsymbol{\beta}},

and using that

\frac{\partial \boldsymbol{w}}{\partial \boldsymbol{\beta}}=-\boldsymbol{X},

where we used the result from example two above. Inserting the last expression we obtain

\frac{\partial C(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}=-\frac{2}{n}\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right)^T\boldsymbol{X},

or as

\frac{\partial C(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}^T}=-\frac{2}{n}\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right).