In order to understand the relation among the predictors (or features or properties) \( p \), the set of data \( n \) and the target (outcome, output etc) \( \boldsymbol{y} \), consider the model we discussed for describing nuclear binding energies.
There we assumed that we could parametrize the data using a polynomial approximation based on the liquid drop model. Assuming
$$ BE(A) = a_0+a_1A+a_2A^{2/3}+a_3A^{-1/3}+a_4A^{-1}, $$we have five predictors, that is the intercept, the \( A \) dependent term, the \( A^{2/3} \) term and the \( A^{-1/3} \) and \( A^{-1} \) terms. This gives \( p=0,1,2,3,4 \). Furthermore we have \( n \) entries for each predictor. It means that our design matrix is a \( p\times n \) matrix \( \boldsymbol{X} \).
Here the predictors are based on a model we have made. A popular data set which is widely encountered in ML applications is the so-called credit card default data from Taiwan. The data set contains data on \( n=30000 \) credit card holders with predictors like gender, marital status, age, profession, education, etc. In total there are \( 24 \) such predictors or attributes leading to a design matrix of dimensionality \( 24 \times 30000 \). This is however a classification problem and we will come back to it when we discuss Logistic Regression.