Week 34: Introduction to the course, Logistics and Practicalities

Regression analysis, overarching aims

Regression modeling deals with the description of the sampling distribution of a given random variable $y$ and how it varies as function of another variable or a set of such variables $\boldsymbol{x} =[x_0, x_1,\dots, x_{n-1}]^T$ . The first variable is called the dependent, the outcome or the response variable while the set of variables $\boldsymbol{x}$ is called the independent variable, or the predictor variable or the explanatory variable, or simply just the inputs.

A regression model aims at finding a likelihood function $p(\boldsymbol{y}\vert \boldsymbol{x})$ or in the more traditional sense a function $\boldsymbol{y}(\boldsymbol{x})$ , that is the conditional distribution for $\boldsymbol{y}$ with a given $\boldsymbol{x}$ . The estimation of $p(\boldsymbol{y}\vert \boldsymbol{x})$ is made using a data set with

$n$ cases $i = 0, 1, 2, \dots, n-1$
Response (target, dependent or outcome) variable $y_i$ with $i = 0, 1, 2, \dots, n-1$
$p$ so-called explanatory (independent or predictor or feature) variables $\boldsymbol{x}_i=[x_{i0}, x_{i1}, \dots, x_{ip-1}]$ with $i = 0, 1, 2, \dots, n-1$ and explanatory variables running from $0$ to $p-1$ . See below for more explicit examples.

The goal of the regression analysis is to extract/exploit relationship between $\boldsymbol{y}$ and $\boldsymbol{x}$ in order to infer specific dependencies, approximations to the likelihood functions, functional relationships and to make predictions, making fits and many other things.