Week 40: Gradient descent methods (continued) and start Neural networks

Minimizing the cross entropy

The cross entropy is a convex function of the weights $ \boldsymbol{\theta} $ and, therefore, any local minimizer is a global minimizer.

Minimizing this cost function with respect to the two parameters $ \theta_0 $ and $ \theta_1 $ we obtain

$$ \frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right), $$

and

$$ \frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right). $$