Processing math: 100%

 

 

 

Summary of a typical RNN

  1. Weight matrices U, W and V that connect the input layer at a stage t with the hidden layer ht, the previous hidden layer ht1 with ht and the hidden layer ht connecting with the output layer at the same stage and producing an output ˜yt, respectively.
  2. The output from the hidden layer ht is oftem modulated by a tanh function ht=σh(xt,ht1)=tanh(Uxt+Wht1+b) with b a bias value
  3. The output from the hidden layer produces ˜yt=σy(Vht+c) where c is a new bias parameter.
  4. The output from the training at a given stage is in turn compared with the observation yt thorugh a chosen cost function.

The function g can any of the standard activation functions, that is a Sigmoid, a Softmax, a ReLU and other. The parameters are trained through the so-called back-propagation through time (BPTT) algorithm.