Regression analysis, overarching aims

Regression modeling deals with the description of the sampling distribution of a given random variable \( y \) and how it varies as function of another variable or a set of such variables \( \boldsymbol{x} =[x_0, x_1,\dots, x_{n-1}]^T \). The first variable is called the dependent, the outcome or the response variable while the set of variables \( \boldsymbol{x} \) is called the independent variable, or the predictor variable or the explanatory variable, or simply just the inputs.

A regression model aims at finding a likelihood function \( p(\boldsymbol{y}\vert \boldsymbol{x}) \) or in the more traditional sense a function \( \boldsymbol{y}(\boldsymbol{x}) \), that is the conditional distribution for \( \boldsymbol{y} \) with a given \( \boldsymbol{x} \). The estimation of \( p(\boldsymbol{y}\vert \boldsymbol{x}) \) is made using a data set with

  • \( n \) cases \( i = 0, 1, 2, \dots, n-1 \)
  • Response (target, dependent or outcome) variable \( y_i \) with \( i = 0, 1, 2, \dots, n-1 \)
  • \( p \) so-called explanatory (independent or predictor or feature) variables \( \boldsymbol{x}_i=[x_{i0}, x_{i1}, \dots, x_{ip-1}] \) with \( i = 0, 1, 2, \dots, n-1 \) and explanatory variables running from \( 0 \) to \( p-1 \). See below for more explicit examples.

The goal of the regression analysis is to extract/exploit relationship between \( \boldsymbol{y} \) and \( \boldsymbol{x} \) in order to infer specific dependencies, approximations to the likelihood functions, functional relationships and to make predictions, making fits and many other things.