Week 45, Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)

Loading [MathJax]/extensions/TeX/autobold.js

The problem of exploding or vanishing gradients

What happens to the magnitude of the gradients as we backpropagate through many layers?
1. If the weights are small, the gradients shrink exponentially.
2. If the weights are big the gradients grow exponentially.
Typical feed-forward neural nets can cope with these exponential effects because they only have a few hidden layers.
In an RNN trained on long sequences (e.g. 100 time steps) the gradients can easily explode or vanish.
1. We can avoid this by initializing the weights very carefully.
Even with good initial weights, its very hard to detect that the current target output depends on an input from many time-steps ago.

RNNs have difficulty dealing with long-range dependencies.