Week 42 Constructing a Neural Network code with examples

Important observations

From the above equations we see that the derivatives of the activation functions play a central role. If they vanish, the training may stop. This is called the vanishing gradient problem, see discussions below. If they become large, the parameters $w_i$ and $b_i$ may simply go to infinity. This is referenced as the exploding gradient problem.