Optimizing our parameters

We have defined the matrix \( \boldsymbol{X} \) via the equations

$$ \begin{align*} y_0&=\theta_0x_{00}+\theta_1x_{01}+\theta_2x_{02}+\dots+\theta_{n-1}x_{0n-1}+\epsilon_0\\ y_1&=\theta_0x_{10}+\theta_1x_{11}+\theta_2x_{12}+\dots+\theta_{n-1}x_{1n-1}+\epsilon_1\\ y_2&=\theta_0x_{20}+\theta_1x_{21}+\theta_2x_{22}+\dots+\theta_{n-1}x_{2n-1}+\epsilon_1\\ \dots & \dots \\ y_{i}&=\theta_0x_{i0}+\theta_1x_{i1}+\theta_2x_{i2}+\dots+\theta_{n-1}x_{in-1}+\epsilon_1\\ \dots & \dots \\ y_{n-1}&=\theta_0x_{n-1,0}+\theta_1x_{n-1,2}+\theta_2x_{n-1,2}+\dots+\theta_{n-1}x_{n-1,n-1}+\epsilon_{n-1}.\\ \end{align*} $$

As we noted above, we stayed with a system with the design matrix \( \boldsymbol{X}\in {\mathbb{R}}^{n\times n} \), that is we have \( p=n \). For reasons to come later (algorithmic arguments) we will hereafter define our matrix as \( \boldsymbol{X}\in {\mathbb{R}}^{n\times p} \), with the predictors refering to the column numbers and the entries \( n \) being the row elements.