Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression

Interpreting the Ridge results

Since $ \lambda \geq 0 $, it means that compared to OLS, we have

$$ \frac{\sigma_j^2}{\sigma_j^2+\lambda} \leq 1. $$

Ridge regression finds the coordinates of $ \boldsymbol{y} $ with respect to the orthonormal basis $ \boldsymbol{U} $, it then shrinks the coordinates by $ \frac{\sigma_j^2}{\sigma_j^2+\lambda} $. Recall that the SVD has eigenvalues ordered in a descending way, that is $ \sigma_i \geq \sigma_{i+1} $.

For small eigenvalues $ \sigma_i $ it means that their contributions become less important, a fact which can be used to reduce the number of degrees of freedom. More about this when we have covered the material on a statistical interpretation of various linear regression methods.