Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Wrapping it up

If we minimize with respect to \boldsymbol{\beta} we have then

\hat{\boldsymbol{\beta}} = (\tilde{X}^T\tilde{X})^{-1}\tilde{X}^T\boldsymbol{\tilde{y}},

where \boldsymbol{\tilde{y}} = \boldsymbol{y} - \overline{\boldsymbol{y}} and \tilde{X}_{ij} = X_{ij} - \frac{1}{n}\sum_{k=0}^{n-1}X_{kj} .

For Ridge regression we need to add \lambda \boldsymbol{\beta}^T\boldsymbol{\beta} to the cost function and get then

\hat{\boldsymbol{\beta}} = (\tilde{X}^T\tilde{X} + \lambda I)^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}.

What does this mean? And why do we insist on all this? Let us look at some examples.