Week 36: Linear Regression and Statistical interpretations

Loading [MathJax]/extensions/TeX/boldsymbol.js

A new Cost Function

We could now define a new cost function to minimize, namely the negative logarithm of the above PDF

$C(\boldsymbol{\beta}=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})},$

which becomes

$C(\boldsymbol{\beta}=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}.$

Taking the derivative of the new cost function with respect to the parameters $\beta$ we recognize our familiar OLS equation, namely

$\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right) =0,$

which leads to the well-known OLS equation for the optimal paramters $\beta$

$\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}!$

Before we make a similar analysis for Ridge and Lasso regression, we need a short reminder on statistics.