Deriving OLS from a probability distribution

Our basic assumption when we derived the OLS equations was to assume that our output is determined by a given continuous function \( f(\boldsymbol{x}) \) and a random noise \( \boldsymbol{\epsilon} \) given by the normal distribution with zero mean value and an undetermined variance \( \sigma^2 \).

We found above that the outputs \( \boldsymbol{y} \) have a mean value given by \( \boldsymbol{X}\hat{\boldsymbol{\beta}} \) and variance \( \sigma^2 \). Since the entries to the design matrix are not stochastic variables, we can assume that the probability distribution of our targets is also a normal distribution but now with mean value \( \boldsymbol{X}\hat{\boldsymbol{\beta}} \). This means that a single output \( y_i \) is given by the Gaussian distribution

$$ y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\beta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}. $$