Week 36: Linear Regression and Statistical interpretations

Loading [MathJax]/extensions/TeX/boldsymbol.js

Deriving the Ridge Regression Equations

Using the matrix-vector expression for Ridge regression and dropping the parameter $1/n$ in front of the standard means squared error equation, we have

$C(\boldsymbol{X},\boldsymbol{\beta})=\left\{(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})^T(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\right\}+\lambda\boldsymbol{\beta}^T\boldsymbol{\beta},$

and taking the derivatives with respect to $\boldsymbol{\beta}$ we obtain then a slightly modified matrix inversion problem which for finite values of $\lambda$ does not suffer from singularity problems. We obtain the optimal parameters

$\hat{\boldsymbol{\beta}}_{\mathrm{Ridge}} = \left(\boldsymbol{X}^T\boldsymbol{X}+\lambda\boldsymbol{I}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y},$

with $\boldsymbol{I}$ being a $p\times p$ identity matrix with the constraint that

$\sum_{i=0}^{p-1} \beta_i^2 \leq t,$

with $t$ a finite positive number.