A better approach

A better approach is rather to try to define a large margin between the two classes (if they are well separated from the beginning).

Thus, we wish to find a margin \( M \) with \( \boldsymbol{w} \) normalized to \( \vert\vert \boldsymbol{w}\vert\vert =1 \) subject to the condition $$ y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq M \hspace{0.1cm}\forall i=1,2,\dots, p. $$ All points are thus at a signed distance from the decision boundary defined by the line \( L \). The parameters \( b \) and \( w_1 \) and \( w_2 \) define this line.

We seek thus the largest value \( M \) defined by $$ \frac{1}{\vert \vert \boldsymbol{w}\vert\vert}y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq M \hspace{0.1cm}\forall i=1,2,\dots, n, $$ or just $$ y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq M\vert \vert \boldsymbol{w}\vert\vert \hspace{0.1cm}\forall i. $$ If we scale the equation so that \( \vert \vert \boldsymbol{w}\vert\vert = 1/M \), we have to find the minimum of \( \boldsymbol{w}^T\boldsymbol{w}=\vert \vert \boldsymbol{w}\vert\vert \) (the norm) subject to the condition $$ y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq 1 \hspace{0.1cm}\forall i. $$

We have thus defined our margin as the invers of the norm of \( \boldsymbol{w} \). We want to minimize the norm in order to have a as large as possible margin \( M \). Before we proceed, we need to remind ourselves about Lagrangian multipliers.