We change our simple model to (see graph) a network with just one hidden layer but with scalar variables only.
Our output variable changes to a_2 and a_1 is now the output from the hidden node and a_0=x . We have then
z_1 = w_1a_0+b_1 \hspace{0.1cm} \wedge a_1 = \sigma_1(z_1), z_2 = w_2a_1+b_2 \hspace{0.1cm} \wedge a_2 = \sigma_2(z_2),and the cost function
C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,with \boldsymbol{\Theta}=[w_1,w_2,b_1,b_2] .