Data Analysis and Machine Learning: Support Vector Machines

Loading [MathJax]/extensions/TeX/boldsymbol.js

Setting up the problem

In order to solve the above problem, we define the following Lagrangian function to be minimized

${\cal L}(\lambda,b,\boldsymbol{w})=\frac{1}{2}\boldsymbol{w}^T\boldsymbol{w}-\sum_{i=1}^n\lambda_i\left[y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b)-1\right],$ where

$\lambda_i$ is a so-called Lagrange multiplier subject to the condition

$\lambda_i \geq 0$ .

Taking the derivatives with respect to $b$ and $\boldsymbol{w}$ we obtain $\frac{\partial {\cal L}}{\partial b} = -\sum_{i} \lambda_iy_i=0,$ and $\frac{\partial {\cal L}}{\partial \boldsymbol{w}} = 0 = \boldsymbol{w}-\sum_{i} \lambda_iy_i\boldsymbol{x}_i.$ Inserting these constraints into the equation for ${\cal L}$ we obtain ${\cal L}=\sum_i\lambda_i-\frac{1}{2}\sum_{ij}^n\lambda_i\lambda_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j,$ subject to the constraints $\lambda_i\geq 0$ and $\sum_i\lambda_iy_i=0$ . We must in addition satisfy the Karush-Kuhn-Tucker (KKT) condition $\lambda_i\left[y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) -1\right] \hspace{0.1cm}\forall i.$

If $\lambda_i > 0$ , then $y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b)=1$ and we say that $x_i$ is on the boundary.
If $y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b)> 1$ , we say $x_i$ is not on the boundary and we set $\lambda_i=0$ .

When

$\lambda_i > 0$ , the vectors

$\boldsymbol{x}_i$ are called support vectors. They are the vectors closest to the line (or hyperplane) and define the margin

$M$ .