Consider an experiment in which \( p \) characteristics/features of \( n \) samples are measured. The data from this experiment, for various explanatory variables \( p \) are normally represented by a matrix \( \mathbf{X} \).
The matrix \( \mathbf{X} \) is called the design matrix. Additional information of the samples is available in the form of \( \boldsymbol{y} \) (also as above). The variable \( \boldsymbol{y} \) is generally referred to as the response variable. The aim of regression analysis is to explain \( \boldsymbol{y} \) in terms of \( \boldsymbol{X} \) through a functional relationship like \( y_i = f(\mathbf{X}_{i,\ast}) \). When no prior knowledge on the form of \( f(\cdot) \) is available, it is common to assume a linear relationship between \( \boldsymbol{X} \) and \( \boldsymbol{y} \). This assumption gives rise to the linear regression model where \( \boldsymbol{\beta} = [\beta_0, \ldots, \beta_{p-1}]^{T} \) are the regression parameters.
Linear regression gives us a set of analytical equations for the parameters \( \beta_j \).