Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression.
For ordinary least squares we postulated that the maximum likelihood for the doamin of events D (one-dimensional case)
D=[(x0,y0),(x1,y1),…,(xn−1,yn−1)],is given by
p(D|β)=n−1∏i=01√2πσ2exp[−(yi−Xi,∗β)22σ2].In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set β given a domain of events D? That is, how can we define the posterior probability
p(β|D).Bayes' theorem comes to our rescue here since (omitting the normalization constant)
p(β|D)∝p(D|β)p(β).We have a model for p(D|β) but need one for the prior p(β)!