Week 36: Linear Regression and Gradient descent

Lasso Regression

For Lasso regression our cost function is

$$ C(\boldsymbol{\theta})=\sum_{i=0}^{p-1}(y_i-\theta_i)^2+\lambda\sum_{i=0}^{p-1}\vert\theta_i\vert=\sum_{i=0}^{p-1}(y_i-\theta_i)^2+\lambda\sum_{i=0}^{p-1}\sqrt{\theta_i^2}, $$

and minimizing we have that

$$ -2\sum_{i=0}^{p-1}(y_i-\theta_i)+\lambda \sum_{i=0}^{p-1}\frac{(\theta_i)}{\vert\theta_i\vert}=0, $$

which leads to

$$ \hat{\boldsymbol{\theta}}_i^{\mathrm{Lasso}} = \left\{\begin{array}{ccc}y_i-\frac{\lambda}{2} &\mathrm{if} & y_i> \frac{\lambda}{2}\\ y_i+\frac{\lambda}{2} &\mathrm{if} & y_i < -\frac{\lambda}{2}\\ 0 &\mathrm{if} & \vert y_i\vert\le \frac{\lambda}{2}\end{array}\right.\\. $$

Plotting these results shows clearly that Lasso regression suppresses (sets to zero) values of $ \theta_i $ for specific values of $ \lambda $. Ridge regression reduces on the other hand the values of $ \theta_i $ as function of $ \lambda $.