Data Analysis and Machine Learning: Support Vector Machines

Loading [MathJax]/extensions/TeX/boldsymbol.js

A better approach

A better approach is rather to try to define a large margin between the two classes (if they are well separated from the beginning).

Thus, we wish to find a margin $M$ with $\boldsymbol{w}$ normalized to $\vert\vert \boldsymbol{w}\vert\vert =1$ subject to the condition $y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq M \hspace{0.1cm}\forall i=1,2,\dots, p.$ All points are thus at a signed distance from the decision boundary defined by the line $L$ . The parameters $b$ and $w_1$ and $w_2$ define this line.

We seek thus the largest value $M$ defined by $\frac{1}{\vert \vert \boldsymbol{w}\vert\vert}y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq M \hspace{0.1cm}\forall i=1,2,\dots, n,$ or just $y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq M\vert \vert \boldsymbol{w}\vert\vert \hspace{0.1cm}\forall i.$ If we scale the equation so that $\vert \vert \boldsymbol{w}\vert\vert = 1/M$ , we have to find the minimum of $\boldsymbol{w}^T\boldsymbol{w}=\vert \vert \boldsymbol{w}\vert\vert$ (the norm) subject to the condition $y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) \geq 1 \hspace{0.1cm}\forall i.$

We have thus defined our margin as the invers of the norm of $\boldsymbol{w}$ . We want to minimize the norm in order to have a as large as possible margin $M$ . Before we proceed, we need to remind ourselves about Lagrangian multipliers.