We assume now that the various yi values are stochastically distributed according to the above Gaussian distribution. We define this distribution as
p(yi,X|β)=1√2πσ2exp[−(yi−Xi,∗β)22σ2],which reads as finding the likelihood of an event yi with the input variables X given the parameters (to be determined) β.
Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event y as the product of the single events, that is we have
p(y,X|β)=n−1∏i=01√2πσ2exp[−(yi−Xi,∗β)22σ2]=n−1∏i=0p(yi,X|β).We will write this in a more compact form reserving D for the domain of events, including the ouputs (targets) and the inputs. That is in case we have a simple one-dimensional input and output case
D=[(x0,y0),(x1,y1),…,(xn−1,yn−1)].In the more general case the various inputs should be replaced by the possible features represented by the input data set X. We can now rewrite the above probability as
p(D|β)=n−1∏i=01√2πσ2exp[−(yi−Xi,∗β)22σ2].It is a conditional probability (see below) and reads as the likelihood of a domain of events D given a set of parameters β.