Week 47: Recurrent neural networks and Autoencoders

LSTM details

The first stage is called the forget gate, where we combine the input at (say, time $ t $), and the hidden cell state input at $ t-1 $, passing it through the Sigmoid activation function and then performing an element-wise multiplication, denoted by $ \odot $.

Mathematically we have (see also figure below)

$$ \mathbf{f}^{(t)} = \sigma(W_{fx}\mathbf{x}^{(t)} + W_{fh}\mathbf{h}^{(t-1)} + \mathbf{b}_f) $$

where the $W$s are the weights to be trained.