The first stage is called the forget gate, where we combine the input at (say, time t ), and the hidden cell state input at t-1 , passing it through the Sigmoid activation function and then performing an element-wise multiplication, denoted by \otimes .
It follows
\mathbf{f}^{(t)} = \sigma(W_f\mathbf{x}^{(t)} + U_f\mathbf{h}^{(t-1)} + \mathbf{b}_f)where W and U are the weights respectively.