One of the typical problems we encounter with linear regression, in particular when the matrix X (our so-called design matrix) is high-dimensional, are problems with near singular or singular matrices. The column vectors of X may be linearly dependent, normally referred to as super-collinearity. This means that the matrix may be rank deficient and it is basically impossible to to model the data using linear regression. As an example, consider the matrix
X=[1−1210112−1110]The columns of X are linearly dependent. We see this easily since the the first column is the row-wise sum of the other two columns. The rank (more correct, the column rank) of a matrix is the dimension of the space spanned by the column vectors. Hence, the rank of X is equal to the number of linearly independent columns. In this particular case the matrix has rank 2.
Super-collinearity of an (n×p)-dimensional design matrix X implies that the inverse of the matrix XTX (the matrix we need to invert to solve the linear regression equations) is non-invertible. If we have a square matrix that does not have an inverse, we say this matrix singular. The example here demonstrates this
X=[1−11−1].We see easily that det(X)=x11x22−x12x21=1×(−1)−1×(−1)=0. Hence, X is singular and its inverse is undefined. This is equivalent to saying that the matrix X has at least an eigenvalue which is zero.