Bayes' Theorem and Ridge and Lasso Regression

Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression.

For ordinary least squares we postulated that the maximum likelihood for the doamin of events \( \boldsymbol{D} \) (one-dimensional case)

$$ \boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})], $$

is given by

$$ p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}. $$

In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set \( \boldsymbol{\beta} \) given a domain of events \( \boldsymbol{D} \)? That is, how can we define the posterior probability

$$ p(\boldsymbol{\beta}\vert\boldsymbol{D}). $$

Bayes' theorem comes to our rescue here since (omitting the normalization constant)

$$ p(\boldsymbol{\beta}\vert\boldsymbol{D})\propto p(\boldsymbol{D}\vert\boldsymbol{\beta})p(\boldsymbol{\beta}). $$

We have a model for \( p(\boldsymbol{D}\vert\boldsymbol{\beta}) \) but need one for the prior \( p(\boldsymbol{\beta} \)!