Week 47: Recurrent neural networks and Autoencoders

The forget gate

The naming forget gate stems from the fact that the Sigmoid activation function's outputs are very close to $ 0 $ if the argument for the function is very negative, and $ 1 $ if the argument is very positive. Hence we can control the amount of information we want to take from the long-term memory.

$$ \mathbf{f}^{(t)} = \sigma(W_{fx}\mathbf{x}^{(t)} + W_{fh}\mathbf{h}^{(t-1)} + \mathbf{b}_f) $$

where the $W$s are the weights to be trained.