Four effective ways to learn an RNN

  1. Long Short Term Memory Make the RNN out of little modules that are designed to remember values for a long time.
  2. Hessian Free Optimization: Deal with the vanishing gradients problem by using a fancy optimizer that can detect directions with a tiny gradient but even smaller curvature.
  3. Echo State Networks: Initialize the input a hidden and hidden-hidden and output-hidden connections very carefully so that the hidden state has a huge reservoir of weakly coupled oscillators which can be selectively driven by the input.
  4. Good initialization with momentum Initialize like in Echo State Networks, but then learn all of the connections using momentum

Long Short Term Memory (LSTM)

LSTM uses a memory cell for modeling long-range dependencies and avoid vanishing gradient problems.

  1. Introduced by Hochreiter and Schmidhuber (1997) who solved the problem of getting an RNN to remember things for a long time (like hundreds of time steps).
  2. They designed a memory cell using logistic and linear units with multiplicative interactions.
  3. Information gets into the cell whenever its “write” gate is on.
  4. The information stays in the cell so long as its keep gate is on.
  5. Information can be read from the cell by turning on its read gate.

Implementing a memory cell in a neural network

To preserve information for a long time in the activities of an RNN, we use a circuit that implements an analog memory cell.

  1. A linear unit that has a self-link with a weight of 1 will maintain its state.
  2. Information is stored in the cell by activating its write gate.
  3. Information is retrieved by activating the read gate.
  4. We can backpropagate through this circuit because logistics are have nice derivatives.