Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Deriving the Lasso Regression Equations

Using the matrix-vector expression for Lasso regression, we have the following cost function

C(\boldsymbol{X},\boldsymbol{\beta})=\frac{1}{n}\left\{(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})^T(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\right\}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_1,

Taking the derivative with respect to \boldsymbol{\beta} and recalling that the derivative of the absolute value is (we drop the boldfaced vector symbol for simplicity)

\frac{d \vert \beta\vert}{d \beta}=\mathrm{sgn}(\beta)=\left\{\begin{array}{cc} 1 & \beta > 0 \\-1 & \beta < 0, \end{array}\right.

we have that the derivative of the cost function is

\frac{\partial C(\boldsymbol{X},\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}=-\frac{2}{n}\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})+\lambda sgn(\boldsymbol{\beta})=0,

and reordering we have

\boldsymbol{X}^T\boldsymbol{X}\boldsymbol{\beta}+\lambda sgn(\boldsymbol{\beta})=\boldsymbol{X}^T\boldsymbol{y}.

This equation does not lead to a nice analytical equation as in Ridge regression or ordinary least squares. We have absorbed the factor 2/n in a redefinition of the parameter \lambda . We will solve this type of problems using libraries like scikit-learn.