Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

A better approach

A better approach is rather to try to define a large margin between the two classes (if they are well separated from the beginning).

Thus, we wish to find a margin M with \boldsymbol{w} normalized to \vert\vert \boldsymbol{w}\vert\vert =1 subject to the condition y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq M \hspace{0.1cm}\forall i=1,2,\dots, p. All points are thus at a signed distance from the decision boundary defined by the line L . The parameters b and w_1 and w_2 define this line.

We seek thus the largest value M defined by \frac{1}{\vert \vert \boldsymbol{w}\vert\vert}y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq M \hspace{0.1cm}\forall i=1,2,\dots, n, or just y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq M\vert \vert \boldsymbol{w}\vert\vert \hspace{0.1cm}\forall i. If we scale the equation so that \vert \vert \boldsymbol{w}\vert\vert = 1/M , we have to find the minimum of \boldsymbol{w}^T\boldsymbol{w}=\vert \vert \boldsymbol{w}\vert\vert (the norm) subject to the condition y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq 1 \hspace{0.1cm}\forall i.

We have thus defined our margin as the invers of the norm of \boldsymbol{w} . We want to minimize the norm in order to have a as large as possible margin M . Before we proceed, we need to remind ourselves about Lagrangian multipliers.