Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Derivative of the cost function

With these definitions we can now compute the derivative of the cost function in terms of the weights.

Let us specialize to the output layer l=L . Our cost function is

{\cal C}(\boldsymbol{\Theta}^L) = \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2=\frac{1}{2}\sum_{i=1}^n\left(a_i^L - y_i\right)^2,

The derivative of this function with respect to the weights is

\frac{\partial{\cal C}(\boldsymbol{\Theta}^L)}{\partial w_{ij}^L} = \left(a_j^L - y_j\right)\frac{\partial a_j^L}{\partial w_{ij}^{L}},

The last partial derivative can easily be computed and reads (by applying the chain rule)

\frac{\partial a_j^L}{\partial w_{ij}^{L}} = \frac{\partial a_j^L}{\partial z_{j}^{L}}\frac{\partial z_j^L}{\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}.