Week 37: Statistical interpretations and Resampling Methods

Processing math: 100%

Bayes' Theorem and Ridge and Lasso Regression

Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression.

For ordinary least squares we postulated that the maximum likelihood for the doamin of events $\boldsymbol{D}$ (one-dimensional case)

$\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})],$

is given by

$p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.$

In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set $\boldsymbol{\beta}$ given a domain of events $\boldsymbol{D}$ ? That is, how can we define the posterior probability

$p(\boldsymbol{\beta}\vert\boldsymbol{D}).$

Bayes' theorem comes to our rescue here since (omitting the normalization constant)

$p(\boldsymbol{\beta}\vert\boldsymbol{D})\propto p(\boldsymbol{D}\vert\boldsymbol{\beta})p(\boldsymbol{\beta}).$

We have a model for $p(\boldsymbol{D}\vert\boldsymbol{\beta})$ but need one for the prior $p(\boldsymbol{\beta})$ !