First attempt at a minimization approach

How do we find the parameter \( b \) and the vector \( \boldsymbol{w} \)? What we could do is to define a cost function which now contains the set of all misclassified points \( M \) and attempt to minimize this function $$ C(\boldsymbol{w},b) = -\sum_{i\in M} y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b). $$

We could now for example define all values \( y_i =1 \) as misclassified in case we have \( \boldsymbol{w}^T\boldsymbol{x}_i+b < 0 \) and the opposite if we have \( y_i=-1 \). Taking the derivatives gives us $$ \frac{\partial C}{\partial b} = -\sum_{i\in M} y_i, $$ and $$ \frac{\partial C}{\partial \boldsymbol{w}} = -\sum_{i\in M} y_ix_i. $$