Week 42 Constructing a Neural Network code with examples

Loading [MathJax]/extensions/TeX/boldsymbol.js

Derivative of the cost function

With these definitions we can now compute the derivative of the cost function in terms of the weights.

Let us specialize to the output layer $l=L$ . Our cost function is

${\cal C}(\boldsymbol{\Theta}^L) = \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2=\frac{1}{2}\sum_{i=1}^n\left(a_i^L - y_i\right)^2,$

The derivative of this function with respect to the weights is

$\frac{\partial{\cal C}(\boldsymbol{\Theta}^L)}{\partial w_{ij}^L} = \left(a_j^L - y_j\right)\frac{\partial a_j^L}{\partial w_{ij}^{L}},$

The last partial derivative can easily be computed and reads (by applying the chain rule)

$\frac{\partial a_j^L}{\partial w_{ij}^{L}} = \frac{\partial a_j^L}{\partial z_{j}^{L}}\frac{\partial z_j^L}{\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}.$