If the neural network evaluates \( g_t(x, P) \) at more values for \( x \), say \( N \) values \( x_i \) for \( i = 1, \dots, N \), then the total error to minimize becomes
$$ \begin{equation} \tag{9} \min_{P}\Big\{\frac{1}{N} \sum_{i=1}^N \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \Big\} \end{equation} $$Letting \( \boldsymbol{x} \) be a vector with elements \( x_i \) and \( C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \) denote the cost function, the minimization problem that our network must solve, becomes
$$ \min_{P} C(\boldsymbol{x}, P) $$In terms of \( P_{\text{hidden} } \) and \( P_{\text{output} } \), this could also be expressed as
$$ \min_{P_{\text{hidden} }, \ P_{\text{output} }} C(\boldsymbol{x}, \{P_{\text{hidden} }, P_{\text{output} }\}) $$