Minimization process

For the minimization to be defined, we need to have a cost function at hand to minimize.

It is given that \( f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) \) should be equal to zero in (1). We can choose to consider the mean squared error as the cost function for an input \( x \). Since we are looking at one input, the cost function is just \( f \) squared. The cost function \( c\left(x, P \right) \) can therefore be expressed as

$$ C\left(x, P\right) = \big(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\big)^2 $$

If \( N \) inputs are given as a vector \( \boldsymbol{x} \) with elements \( x_i \) for \( i = 1,\dots,N \), the cost function becomes

$$ \begin{equation} \tag{3} C\left(\boldsymbol{x}, P\right) = \frac{1}{N} \sum_{i=1}^N \big(f\left(x_i, \, g(x_i), \, g'(x_i), \, g''(x_i), \, \dots \, , \, g^{(n)}(x_i)\right)\big)^2 \end{equation} $$

The neural net should then find the parameters \( P \) that minimizes the cost function in (3) for a set of \( N \) training samples \( x_i \).