Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

More LSTM details

The first stage is called the forget gate, where we combine the input at (say, time t ), and the hidden cell state input at t-1 , passing it through the Sigmoid activation function and then performing an element-wise multiplication, denoted by \otimes .

It follows

\mathbf{f}^{(t)} = \sigma(W_f\mathbf{x}^{(t)} + U_f\mathbf{h}^{(t-1)} + \mathbf{b}_f)

where W and U are the weights respectively.