Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression

The cost/loss function

We used the mean squared error to define the way we measure the quality of our model

$$ C(\boldsymbol{\theta})=\frac{1}{n}\sum_{i=0}^{n-1}\left(y_i-\tilde{y}_i\right)^2=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{\tilde{y}}\right)^T\left(\boldsymbol{y}-\boldsymbol{\tilde{y}}\right)\right\}, $$

or using the matrix $ \boldsymbol{X} $ and in a more compact matrix-vector notation as

$$ C(\boldsymbol{\theta})=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)\right\}. $$

This function represents one of many possible ways to define the so-called cost function.

It is also common to define the function $ C $ as

$$ C(\boldsymbol{\theta})=\frac{1}{2n}\sum_{i=0}^{n-1}\left(y_i-\tilde{y}_i\right)^2, $$

since when taking the first derivative with respect to the unknown parameters $ \theta $, the factor of $ 2 $ cancels out.