Solving the equations

We can now use the Newton-Raphson method or gradient descent to solve the equations $$ b \leftarrow b +\eta \frac{\partial C}{\partial b}, $$ and $$ \boldsymbol{w} \leftarrow \boldsymbol{w} +\eta \frac{\partial C}{\partial \boldsymbol{w}}, $$ where \( \eta \) is our by now well-known learning rate.

There are however problems with this approach, although it looks pretty straightforward to implement. In case we separate our data into two distinct classes, we may up with many possible lines, as indicated in the figure and shown by running the following program. For small gaps between the entries, we may also end up needing many iterations before the solutions converge and if the data cannot be separated properly into two distinct classes, we may not experience a converge at all.