We assume now that the various y_i values are stochastically distributed according to the above Gaussian distribution. We define this distribution as
p(y_i, \boldsymbol{X}\vert\boldsymbol{\beta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]},which reads as finding the likelihood of an event y_i with the input variables \boldsymbol{X} given the parameters (to be determined) \boldsymbol{\beta} .
Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event \boldsymbol{y} as the product of the single events, that is we have
p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta}).We will write this in a more compact form reserving \boldsymbol{D} for the domain of events, including the ouputs (targets) and the inputs. That is in case we have a simple one-dimensional input and output case
\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].In the more general case the various inputs should be replaced by the possible features represented by the input data set \boldsymbol{X} . We can now rewrite the above probability as
p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.It is a conditional probability (see below) and reads as the likelihood of a domain of events \boldsymbol{D} given a set of parameters \boldsymbol{\beta} .