Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Examples

In order to understand the relation among the predictors (or features or properties) p , the set of data n and the target (outcome, output etc) \boldsymbol{y} , consider the model we discussed for describing nuclear binding energies.

There we assumed that we could parametrize the data using a polynomial approximation based on the liquid drop model. Assuming

BE(A) = a_0+a_1A+a_2A^{2/3}+a_3A^{-1/3}+a_4A^{-1},

we have five predictors, that is the intercept, the A dependent term, the A^{2/3} term and the A^{-1/3} and A^{-1} terms. This gives p=0,1,2,3,4 . Furthermore we have n entries for each predictor. It means that our design matrix is a p\times n matrix \boldsymbol{X} .

Here the predictors are based on a model we have made. A popular data set which is widely encountered in ML applications is the so-called credit card default data from Taiwan. The data set contains data on n=30000 credit card holders with predictors like gender, marital status, age, profession, education, etc. In total there are 24 such predictors or attributes leading to a design matrix of dimensionality 24 \times 30000 . This is however a classification problem and we will come back to it when we discuss Logistic Regression.