For this specific model, with just one output node and two hidden nodes, the gradient descent equations take the following form for output layer
w(2)i←w(2)i−ηδ(2)a(1)i,and
b(2)←b(2)−ηδ(2),and
w(1)ij←w(1)ij−ηδ(1)ia(0)j,and
b(1)i←b(1)i−ηδ(1)i,where η is the learning rate.