Linear Regression Problems

One of the typical problems we encounter with linear regression, in particular when the matrix \( \boldsymbol{X} \) (our so-called design matrix) is high-dimensional, are problems with near singular or singular matrices. The column vectors of \( \boldsymbol{X} \) may be linearly dependent, normally referred to as super-collinearity. This means that the matrix may be rank deficient and it is basically impossible to to model the data using linear regression. As an example, consider the matrix

$$ \begin{align*} \mathbf{X} & = \left[ \begin{array}{rrr} 1 & -1 & 2 \\ 1 & 0 & 1 \\ 1 & 2 & -1 \\ 1 & 1 & 0 \end{array} \right] \end{align*} $$

The columns of \( \boldsymbol{X} \) are linearly dependent. We see this easily since the the first column is the row-wise sum of the other two columns. The rank (more correct, the column rank) of a matrix is the dimension of the space spanned by the column vectors. Hence, the rank of \( \mathbf{X} \) is equal to the number of linearly independent columns. In this particular case the matrix has rank 2.

Super-collinearity of an \( (n \times p) \)-dimensional design matrix \( \mathbf{X} \) implies that the inverse of the matrix \( \boldsymbol{X}^T\boldsymbol{X} \) (the matrix we need to invert to solve the linear regression equations) is non-invertible. If we have a square matrix that does not have an inverse, we say this matrix singular. The example here demonstrates this

$$ \begin{align*} \boldsymbol{X} & = \left[ \begin{array}{rr} 1 & -1 \\ 1 & -1 \end{array} \right]. \end{align*} $$

We see easily that \( \mbox{det}(\boldsymbol{X}) = x_{11} x_{22} - x_{12} x_{21} = 1 \times (-1) - 1 \times (-1) = 0 \). Hence, \( \mathbf{X} \) is singular and its inverse is undefined. This is equivalent to saying that the matrix \( \boldsymbol{X} \) has at least an eigenvalue which is zero.