Minimizing the cross entropy

The cross entropy is a convex function of the weights \( \boldsymbol{\beta} \) and, therefore, any local minimizer is a global minimizer.

Minimizing this cost function with respect to the two parameters \( \beta_0 \) and \( \beta_1 \) we obtain

$$ \frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \beta_0} = -\sum_{i=1}^n \left(y_i -\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right), $$

and

$$ \frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \beta_1} = -\sum_{i=1}^n \left(y_ix_i -x_i\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right). $$