Goals

The goal of the hidden layer is to increase the model's expressive power. We encode complex interactions between visible variables by introducing additional, hidden variables that interact with visible degrees of freedom in a simple manner, yet still reproduce the complex correlations between visible degrees in the data once marginalized over (integrated out).

The network parameters, to be optimized/learned:
  1. \( \mathbf{a} \) represents the visible bias, a vector of same length as \( \mathbf{x} \).
  2. \( \mathbf{b} \) represents the hidden bias, a vector of same lenght as \( \mathbf{h} \).
  3. \( W \) represents the interaction weights, a matrix of size \( M\times N \).