If we minimize with respect to \boldsymbol{\beta} we have then
\hat{\boldsymbol{\beta}} = (\tilde{X}^T\tilde{X})^{-1}\tilde{X}^T\boldsymbol{\tilde{y}},where \boldsymbol{\tilde{y}} = \boldsymbol{y} - \overline{\boldsymbol{y}} and \tilde{X}_{ij} = X_{ij} - \frac{1}{n}\sum_{k=0}^{n-1}X_{kj} .
For Ridge regression we need to add \lambda \boldsymbol{\beta}^T\boldsymbol{\beta} to the cost function and get then
\hat{\boldsymbol{\beta}} = (\tilde{X}^T\tilde{X} + \lambda I)^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}.What does this mean? And why do we insist on all this? Let us look at some examples.