Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression

Loading [MathJax]/extensions/TeX/boldsymbol.js

What does it mean?

This means the vectors $\boldsymbol{v}_i$ of the orthogonal matrix $\boldsymbol{V}$ are the eigenvectors of the matrix $\boldsymbol{X}^T\boldsymbol{X}$ with eigenvalues given by the singular values squared, that is

$\left(\boldsymbol{X}^T\boldsymbol{X}\right)\boldsymbol{v}_i=\boldsymbol{v}_i\sigma_i^2.$

In other words, each non-zero singular value of $\boldsymbol{X}$ is a positive square root of an eigenvalue of $\boldsymbol{X}^T\boldsymbol{X}$ . It means also that the columns of $\boldsymbol{V}$ are the eigenvectors of $\boldsymbol{X}^T\boldsymbol{X}$ . Since we have ordered the singular values of $\boldsymbol{X}$ in a descending order, it means that the column vectors $\boldsymbol{v}_i$ are hierarchically ordered by how much correlation they encode from the columns of $\boldsymbol{X}$ .

Note that these are also the eigenvectors and eigenvalues of the Hessian matrix. Note also that the Hessian matrix we are discussing here is from a cost function defined by the mean squared error only.

If we now recall the definition of the covariance matrix (not using Bessel's correction) we have

$\boldsymbol{C}[\boldsymbol{X}]=\frac{1}{n}\boldsymbol{X}^T\boldsymbol{X},$

meaning that every squared non-singular value of $\boldsymbol{X}$ divided by $n$ ( the number of samples) are the eigenvalues of the covariance matrix. Every singular value of $\boldsymbol{X}$ is thus a positive square root of an eigenvalue of $\boldsymbol{X}^T\boldsymbol{X}$ . If the matrix $\boldsymbol{X}$ is self-adjoint, the singular values of $\boldsymbol{X}$ are equal to the absolute value of the eigenvalues of $\boldsymbol{X}$ .