What is presented here is a mathematical analysis of various regression algorithms (ordinary least squares, Ridge and Lasso Regression). The analysis is based on an important algorithm in linear algebra, the so-called Singular Value Decomposition (SVD).
We have shown that in ordinary least squares the optimal parameters \beta are given by
\hat{\boldsymbol{\beta}} = \left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}.The hat over \boldsymbol{\beta} means we have the optimal parameters after minimization of the cost function.
This means that our best model is defined as
\tilde{\boldsymbol{y}}=\boldsymbol{X}\hat{\boldsymbol{\beta}} = \boldsymbol{X}\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}.We now define a matrix
\boldsymbol{A}=\boldsymbol{X}\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T.We can rewrite
\tilde{\boldsymbol{y}}=\boldsymbol{X}\hat{\boldsymbol{\beta}} = \boldsymbol{A}\boldsymbol{y}.The matrix \boldsymbol{A} has the important property that \boldsymbol{A}^2=\boldsymbol{A} . This is the definition of a projection matrix. We can then interpret our optimal model \tilde{\boldsymbol{y}} as being represented by an orthogonal projection of \boldsymbol{y} onto a space defined by the column vectors of \boldsymbol{X} . In our case here the matrix \boldsymbol{A} is a square matrix. If it is a general rectangular matrix we have an oblique projection matrix.