Gradients

We now need the gradient of the cost function in order to minimize it. We find that

$$ \begin{align*} \frac{\partial \mathcal{C}(\{ \Theta_i\})}{\partial \Theta_i} &= \langle \frac{\partial E(\boldsymbol{x}; \Theta_i)}{\partial \Theta_i} \rangle_{data} + \frac{\partial \text{log} Z(\{ \Theta_i\})}{\partial \Theta_i} \\ &= \langle O_i(\boldsymbol{x}) \rangle_{data} - \langle O_i(\boldsymbol{x}) \rangle_{model}. \end{align*} $$