Week 34: Introduction to the course, Logistics and Practicalities

Examples

In order to understand the relation among the predictors (or features or properties) $p$ , the set of data $n$ and the target (outcome, output etc) $\boldsymbol{y}$ , consider the model we discussed for describing nuclear binding energies.

There we assumed that we could parametrize the data using a polynomial approximation based on the liquid drop model. Assuming

$BE(A) = a_0+a_1A+a_2A^{2/3}+a_3A^{-1/3}+a_4A^{-1},$

we have five predictors, that is the intercept, the $A$ dependent term, the $A^{2/3}$ term and the $A^{-1/3}$ and $A^{-1}$ terms. This gives $p=0,1,2,3,4$ . Furthermore we have $n$ entries for each predictor. It means that our design matrix is a $p\times n$ matrix $\boldsymbol{X}$ .

Here the predictors are based on a model we have made. A popular data set which is widely encountered in ML applications is the so-called credit card default data from Taiwan. The data set contains data on $n=30000$ credit card holders with predictors like gender, marital status, age, profession, education, etc. In total there are $24$ such predictors or attributes leading to a design matrix of dimensionality $24 \times 30000$ . This is however a classification problem and we will come back to it when we discuss Logistic Regression.