Interpreting the Ridge results

Since \( \lambda \geq 0 \), it means that compared to OLS, we have

$$ \frac{\sigma_j^2}{\sigma_j^2+\lambda} \leq 1. $$

Ridge regression finds the coordinates of \( \boldsymbol{y} \) with respect to the orthonormal basis \( \boldsymbol{U} \), it then shrinks the coordinates by \( \frac{\sigma_j^2}{\sigma_j^2+\lambda} \). Recall that the SVD has eigenvalues ordered in a descending way, that is \( \sigma_i \geq \sigma_{i+1} \).

For small eigenvalues \( \sigma_i \) it means that their contributions become less important, a fact which can be used to reduce the number of degrees of freedom. More about this when we have covered the material on a statistical interpretation of various linear regression methods.