With the back propagate error for each l=L−1,L−2,…,1 as
δlj=∑kδl+1kwl+1kjsigma′(zlj),we update the weights and the biases using gradient descent for each l=L−1,L−2,…,1 and update the weights and biases according to the rules
wljk←=wljk−ηδljal−1k, blj←blj−η∂C∂blj=blj−ηδlj,