Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression

Loading [MathJax]/extensions/TeX/boldsymbol.js

Setting up the Matrix to be inverted

The matrix that may cause problems for us is $\boldsymbol{X}^T\boldsymbol{X}$ . Using the SVD we can rewrite this matrix as

$\boldsymbol{X}^T\boldsymbol{X}=\boldsymbol{V}\boldsymbol{\Sigma}^T\boldsymbol{U}^T\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T,$

and using the orthogonality of the matrix $\boldsymbol{U}$ we have

$\boldsymbol{X}^T\boldsymbol{X}=\boldsymbol{V}\boldsymbol{\Sigma}^T\boldsymbol{\Sigma}\boldsymbol{V}^T.$

We define $\boldsymbol{\Sigma}^T\boldsymbol{\Sigma}=\tilde{\boldsymbol{\Sigma}}^2$ which is a diagonal matrix containing only the singular values squared. It has dimensionality $p \times p$ .

We can now insert the result for the matrix $\boldsymbol{X}^T\boldsymbol{X}$ into our equation for ordinary least squares where

$\tilde{y}_{\mathrm{OLS}}=\boldsymbol{X}\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y},$

and using our SVD decomposition of $\boldsymbol{X}$ we have

$\tilde{y}_{\mathrm{OLS}}=\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T\left(\boldsymbol{V}\tilde{\boldsymbol{\Sigma}}^{2}(\boldsymbol{V}^T\right)^{-1}\boldsymbol{V}\boldsymbol{\Sigma}^T\boldsymbol{U}^T\boldsymbol{y},$

which gives us, using the orthogonality of the matrix $\boldsymbol{V}$ ,

$\tilde{y}_{\mathrm{OLS}}=\boldsymbol{U}\boldsymbol{U}^T\boldsymbol{y}=\sum_{i=0}^{p-1}\boldsymbol{u}_i\boldsymbol{u}^T_i\boldsymbol{y},$

It means that the ordinary least square model (with the optimal parameters) $\boldsymbol{\tilde{y}}$ , corresponds to an orthogonal transformation of the output (or target) vector $\boldsymbol{y}$ by the vectors of the matrix $\boldsymbol{U}$ . Note that the summation ends at $p-1$ , that is $\boldsymbol{\tilde{y}}\ne \boldsymbol{y}$ . We can thus not use the orthogonality relation for the matrix $\boldsymbol{U}$ . This can already be when we multiply the matrices $\boldsymbol{\Sigma}^T\boldsymbol{U}^T$ .