By minimizing the above equation with respect to the parameters \boldsymbol{\beta} we could then obtain an analytical expression for the parameters \boldsymbol{\beta} . We can add a regularization parameter \lambda by defining a new cost function to be optimized, that is
{\displaystyle \min_{\boldsymbol{\beta}\in {\mathbb{R}}^{p}}}\frac{1}{n}\vert\vert \boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\vert\vert_2^2+\lambda\vert\vert \boldsymbol{\beta}\vert\vert_2^2which leads to the Ridge regression minimization problem where we require that \vert\vert \boldsymbol{\beta}\vert\vert_2^2\le t , where t is a finite number larger than zero. We do not include such a constraints in the discussions here.
By defining
C(\boldsymbol{X},\boldsymbol{\beta})=\frac{1}{n}\vert\vert \boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\vert\vert_2^2+\lambda\vert\vert \boldsymbol{\beta}\vert\vert_1,we have a new optimization equation
{\displaystyle \min_{\boldsymbol{\beta}\in {\mathbb{R}}^{p}}}\frac{1}{n}\vert\vert \boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\vert\vert_2^2+\lambda\vert\vert \boldsymbol{\beta}\vert\vert_1which leads to Lasso regression. Lasso stands for least absolute shrinkage and selection operator.
Here we have defined the norm-1 as
\vert\vert \boldsymbol{x}\vert\vert_1 = \sum_i \vert x_i\vert.