Processing math: 100%

 

 

 

Fixing the singularity

If our design matrix X which enters the linear regression problem

β=(XTX)1XTy,

has linearly dependent column vectors, we will not be able to compute the inverse of XTX and we cannot find the parameters (estimators) βi. The estimators are only well-defined if (XTX)1 exits. This is more likely to happen when the matrix X is high-dimensional. In this case it is likely to encounter a situation where the regression parameters βi cannot be estimated.

A cheap ad hoc approach is simply to add a small diagonal component to the matrix to invert, that is we change

XTXXTX+λI,

where I is the identity matrix. When we discuss Ridge regression this is actually what we end up evaluating. The parameter λ is called a hyperparameter. More about this later.