Summary of a typical RNN
- Weight matrices U, W and V that connect the input layer at a stage t with the hidden layer ht, the previous hidden layer ht−1 with ht and the hidden layer ht connecting with the output layer at the same stage and producing an output ˜yt, respectively.
- The output from the hidden layer ht is oftem modulated by a tanh function ht=σh(xt,ht−1)=tanh(Uxt+Wht−1+b) with b a bias value
- The output from the hidden layer produces ˜yt=σy(Vht+c) where c is a new bias parameter.
- The output from the training at a given stage is in turn compared with the observation yt thorugh a chosen cost function.
The function g can any of the standard activation functions, that is a Sigmoid, a Softmax, a ReLU and other.
The parameters are trained through the so-called back-propagation through time (BPTT) algorithm.