More technicalities

The left hand side and right hand side of (8) must be computed separately, and then the neural network must choose weights and biases, contained in \( P \), such that the sides are equal as best as possible. This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero. In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to \( P \) of the neural network.

This gives the following cost function our neural network must solve for:

$$ \min_{P}\Big\{ \big(g_t'(x, P) - ( -\gamma g_t(x, P) \big)^2 \Big\} $$

(the notation \( \min_{P}\{ f(x, P) \} \) means that we desire to find \( P \) that yields the minimum of \( f(x, P) \))

or, in terms of weights and biases for the hidden and output layer in our network:

$$ \min_{P_{\text{hidden} }, \ P_{\text{output} }}\Big\{ \big(g_t'(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) - ( -\gamma g_t(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) \big)^2 \Big\} $$

for an input value \( x \).