We could now define a new cost function to minimize, namely the negative logarithm of the above PDF
$$ C(\boldsymbol{\theta})=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})}, $$which becomes
$$ C(\boldsymbol{\theta})=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta})\vert\vert_2^2}{2\sigma^2}. $$Taking the derivative of the new cost function with respect to the parameters \( \theta \) we recognize our familiar OLS equation, namely
$$ \boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right) =0, $$which leads to the well-known OLS equation for the optimal paramters \( \theta \)
$$ \hat{\boldsymbol{\theta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}! $$Next week we will make a similar analysis for Ridge and Lasso regression