Processing math: 100%

 

 

 

Comparison with OLS

When we compare this with the ordinary least squares result we have

ˆβOLS=(XTX)1XTy,

which can lead to singular matrices. However, with the SVD, we can always compute the inverse of the matrix XTX.

We see that Ridge regression is nothing but the standard OLS with a modified diagonal term added to XTX. The consequences, in particular for our discussion of the bias-variance tradeoff are rather interesting. We will see that for specific values of λ, we may even reduce the variance of the optimal parameters β. These topics and other related ones, will be discussed after the more linear algebra oriented analysis here.