Week 34: Introduction to the course, Logistics and Practicalities

Regression analysis, overarching aims II

Consider an experiment in which $p$ characteristics/features of $n$ samples are measured. The data from this experiment, for various explanatory variables $p$ are normally represented by a matrix $\mathbf{X}$ .

The matrix $\mathbf{X}$ is called the design matrix. Additional information of the samples is available in the form of $\boldsymbol{y}$ (also as above). The variable $\boldsymbol{y}$ is generally referred to as the response variable. The aim of regression analysis is to explain $\boldsymbol{y}$ in terms of $\boldsymbol{X}$ through a functional relationship like $y_i = f(\mathbf{X}_{i,\ast})$ . When no prior knowledge on the form of $f(\cdot)$ is available, it is common to assume a linear relationship between $\boldsymbol{X}$ and $\boldsymbol{y}$ . This assumption gives rise to the linear regression model where $\boldsymbol{\beta} = [\beta_0, \ldots, \beta_{p-1}]^{T}$ are the regression parameters.

Linear regression gives us a set of analytical equations for the parameters $\beta_j$ .