Let us take a closer look at the mathematics of the SVD and the various implications for machine learning studies.
Our starting point is our design matrix \( \boldsymbol{X} \) of dimension \( n\times p \)
$$ \boldsymbol{X}=\begin{bmatrix} x_{0,0} & x_{0,1} & x_{0,2}& \dots & \dots x_{0,p-1}\\ x_{1,0} & x_{1,1} & x_{1,2}& \dots & \dots x_{1,p-1}\\ x_{2,0} & x_{2,1} & x_{2,2}& \dots & \dots x_{2,p-1}\\ \dots & \dots & \dots & \dots \dots & \dots \\ x_{n-2,0} & x_{n-2,1} & x_{n-2,2}& \dots & \dots x_{n-2,p-1}\\ x_{n-1,0} & x_{n-1,1} & x_{n-1,2}& \dots & \dots x_{n-1,p-1}\\ \end{bmatrix}. $$We can SVD decompose our matrix as
$$ \boldsymbol{X}=\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T, $$where \( \boldsymbol{U} \) is an orthogonal matrix of dimension \( n\times n \), meaning that \( \boldsymbol{U}\boldsymbol{U}^T=\boldsymbol{U}^T\boldsymbol{U}=\boldsymbol{I}_n \). Here \( \boldsymbol{I}_n \) is the unit matrix of dimension \( n \times n \).
Similarly, \( \boldsymbol{V} \) is an orthogonal matrix of dimension \( p\times p \), meaning that \( \boldsymbol{V}\boldsymbol{V}^T=\boldsymbol{V}^T\boldsymbol{V}=\boldsymbol{I}_p \). Here \( \boldsymbol{I}_n \) is the unit matrix of dimension \( p \times p \).
Finally \( \boldsymbol{\Sigma} \) contains the singular values \( \sigma_i \). This matrix has dimension \( n\times p \) and the singular values \( \sigma_i \) are all positive. The non-zero values are ordered in descending order, that is
$$ \sigma_0 > \sigma_1 > \sigma_2 > \dots > \sigma_{p-1} > 0. $$All values beyond \( p-1 \) are all zero.