Let us special first to the case where we have only two parameters \( \beta_0 \) and \( \beta_1 \). Our result for \( \beta_0 \) simplifies then to
$$ n\beta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} X_{i1} \beta_1. $$We obtain then
$$ \beta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \beta_1\frac{1}{n}\sum_{i=0}^{n-1} X_{i1}. $$If we define
$$ \mu_1=\frac{1}{n}\sum_{i=0}^{n-1} (X_{i1}, $$and if we define the mean value of the outputs as
$$ \mu_y=\frac{1}{n}\sum_{i=0}^{n-1}y_i, $$we have
$$ \beta_0 = \mu_y - \beta_1\mu_{1}. $$In the general case, that is we have more parameters than \( \beta_0 \) and \( \beta_1 \), we have
$$ \beta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \frac{1}{n}\sum_{i=0}^{n-1}\sum_{j=1}^{p-1} X_{ij}\beta_j. $$Replacing \( y_i \) with \( y_i - y_i - \overline{\boldsymbol{y}} \) and centering also our design matrix results in a cost function (in vector-matrix disguise)
$$ C(\boldsymbol{\beta}) = (\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\beta})^T(\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\beta}). $$