Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Bayes' Theorem and Ridge and Lasso Regression

Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression.

For ordinary least squares we postulated that the maximum likelihood for the doamin of events \boldsymbol{D} (one-dimensional case)

\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})],

is given by

p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.

In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set \boldsymbol{\beta} given a domain of events \boldsymbol{D} ? That is, how can we define the posterior probability

p(\boldsymbol{\beta}\vert\boldsymbol{D}).

Bayes' theorem comes to our rescue here since (omitting the normalization constant)

p(\boldsymbol{\beta}\vert\boldsymbol{D})\propto p(\boldsymbol{D}\vert\boldsymbol{\beta})p(\boldsymbol{\beta}).

We have a model for p(\boldsymbol{D}\vert\boldsymbol{\beta}) but need one for the prior p(\boldsymbol{\beta}) !