With the above we use the design matrix to define the approximation \boldsymbol{\tilde{y}} via the unknown quantity \boldsymbol{\beta} as
\boldsymbol{\tilde{y}}= \boldsymbol{X}\boldsymbol{\beta},and in order to find the optimal parameters \beta_i instead of solving the above linear algebra problem, we define a function which gives a measure of the spread between the values y_i (which represent hopefully the exact values) and the parameterized values \tilde{y}_i , namely
C(\boldsymbol{\beta})=\frac{1}{n}\sum_{i=0}^{n-1}\left(y_i-\tilde{y}_i\right)^2=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{\tilde{y}}\right)^T\left(\boldsymbol{y}-\boldsymbol{\tilde{y}}\right)\right\},or using the matrix \boldsymbol{X} and in a more compact matrix-vector notation as
C(\boldsymbol{\beta})=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right)^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right)\right\}.This function is one possible way to define the so-called cost function.
It is also common to define the function C as
C(\boldsymbol{\beta})=\frac{1}{2n}\sum_{i=0}^{n-1}\left(y_i-\tilde{y}_i\right)^2,since when taking the first derivative with respect to the unknown parameters \beta , the factor of 2 cancels out.