Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Deriving the Ridge Regression Equations

Using the matrix-vector expression for Ridge regression and dropping the parameter 1/n in front of the standard means squared error equation, we have

C(\boldsymbol{X},\boldsymbol{\beta})=\left\{(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})^T(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\right\}+\lambda\boldsymbol{\beta}^T\boldsymbol{\beta},

and taking the derivatives with respect to \boldsymbol{\beta} we obtain then a slightly modified matrix inversion problem which for finite values of \lambda does not suffer from singularity problems. We obtain the optimal parameters

\hat{\boldsymbol{\beta}}_{\mathrm{Ridge}} = \left(\boldsymbol{X}^T\boldsymbol{X}+\lambda\boldsymbol{I}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y},

with \boldsymbol{I} being a p\times p identity matrix with the constraint that

\sum_{i=0}^{p-1} \beta_i^2 \leq t,

with t a finite positive number.