Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Mathematical Interpretation of Ordinary Least Squares

What is presented here is a mathematical analysis of various regression algorithms (ordinary least squares, Ridge and Lasso Regression). The analysis is based on an important algorithm in linear algebra, the so-called Singular Value Decomposition (SVD).

We have shown that in ordinary least squares the optimal parameters \beta are given by

\hat{\boldsymbol{\beta}} = \left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}.

The hat over \boldsymbol{\beta} means we have the optimal parameters after minimization of the cost function.

This means that our best model is defined as

\tilde{\boldsymbol{y}}=\boldsymbol{X}\hat{\boldsymbol{\beta}} = \boldsymbol{X}\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}.

We now define a matrix

\boldsymbol{A}=\boldsymbol{X}\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T.

We can rewrite

\tilde{\boldsymbol{y}}=\boldsymbol{X}\hat{\boldsymbol{\beta}} = \boldsymbol{A}\boldsymbol{y}.

The matrix \boldsymbol{A} has the important property that \boldsymbol{A}^2=\boldsymbol{A} . This is the definition of a projection matrix. We can then interpret our optimal model \tilde{\boldsymbol{y}} as being represented by an orthogonal projection of \boldsymbol{y} onto a space defined by the column vectors of \boldsymbol{X} . In our case here the matrix \boldsymbol{A} is a square matrix. If it is a general rectangular matrix we have an oblique projection matrix.