Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

Back propagation in time in equations

To derive the expression of the gradients of \mathcal{L} for the RNN, we need to start recursively from the nodes closer to the output layer in the temporal unrolling scheme - such as \mathbf{y} and \mathbf{h} at final time t = \tau ,

\begin{align*} (\nabla_{ \mathbf{y}^{(t)}} \mathcal{L})_{i} &= \frac{\partial \mathcal{L}}{\partial L^{(t)}}\frac{\partial L^{(t)}}{\partial y_{i}^{(t)}}, \notag\\ \nabla_{\mathbf{h}^{(\tau)}} \mathcal{L} &= \mathbf{V}^\mathsf{T}\nabla_{ \mathbf{y}^{(\tau)}} \mathcal{L}. \end{align*}