Processing math: 100%

 

 

 

Bayes' Theorem and Ridge and Lasso Regression

Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression.

For ordinary least squares we postulated that the maximum likelihood for the doamin of events D (one-dimensional case)

D=[(x0,y0),(x1,y1),,(xn1,yn1)],

is given by

p(D|β)=n1i=012πσ2exp[(yiXi,β)22σ2].

In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set β given a domain of events D? That is, how can we define the posterior probability

p(β|D).

Bayes' theorem comes to our rescue here since (omitting the normalization constant)

p(β|D)p(D|β)p(β).

We have a model for p(D|β) but need one for the prior p(β)!