A new Cost Function

We could now define a new cost function to minimize, namely the negative logarithm of the above PDF

$$ C(\boldsymbol{\beta}=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})}, $$

which becomes

$$ C(\boldsymbol{\beta}=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}. $$

Taking the derivative of the new cost function with respect to the parameters \( \beta \) we recognize our familiar OLS equation, namely

$$ \boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right) =0, $$

which leads to the well-known OLS equation for the optimal paramters \( \beta \)

$$ \hat{\boldsymbol{\beta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}! $$

Before we make a similar analysis for Ridge and Lasso regression, we need a short reminder on statistics.