The principal component analysis deals with the problem of fitting a low-dimensional affine subspace \( S \) of dimension \( d \) much smaller than the total dimension \( D \) of the problem at hand (our data set). Mathematically it can be formulated as a statistical problem or a geometric problem. In our discussion of the theorem for the classical PCA, we will stay with a statistical approach. Historically, the PCA was first formulated in a statistical setting in order to estimate the principal component of a multivariate random variable.
We have a data set defined by a design/feature matrix \( \boldsymbol{X} \) (see below for its definition)