Lasso Regression

For Lasso regression our cost function is

$$ C(\boldsymbol{\theta})=\sum_{i=0}^{p-1}(y_i-\theta_i)^2+\lambda\sum_{i=0}^{p-1}\vert\theta_i\vert=\sum_{i=0}^{p-1}(y_i-\theta_i)^2+\lambda\sum_{i=0}^{p-1}\sqrt{\theta_i^2}, $$

and minimizing we have that

$$ -2\sum_{i=0}^{p-1}(y_i-\theta_i)+\lambda \sum_{i=0}^{p-1}\frac{(\theta_i)}{\vert\theta_i\vert}=0, $$

which leads to

$$ \hat{\boldsymbol{\theta}}_i^{\mathrm{Lasso}} = \left\{\begin{array}{ccc}y_i-\frac{\lambda}{2} &\mathrm{if} & y_i> \frac{\lambda}{2}\\ y_i+\frac{\lambda}{2} &\mathrm{if} & y_i < -\frac{\lambda}{2}\\ 0 &\mathrm{if} & \vert y_i\vert\le \frac{\lambda}{2}\end{array}\right.\\. $$

Plotting these results shows clearly that Lasso regression suppresses (sets to zero) values of \( \theta_i \) for specific values of \( \lambda \). Ridge regression reduces on the other hand the values of \( \theta_i \) as function of \( \lambda \).