Data Analysis and Machine Learning: Support Vector Machines

Loading [MathJax]/extensions/TeX/boldsymbol.js

Soft optmization problem

This has in turn the consequences that we change our optmization problem to finding the minimum of ${\cal L}=\frac{1}{2}\boldsymbol{w}^T\boldsymbol{w}-\sum_{i=1}^n\lambda_i\left[y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b)-(1-\xi_)\right]+C\sum_{i=1}^n\xi_i-\sum_{i=1}^n\gamma_i\xi_i,$ subject to $y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b)=1-\xi_i \hspace{0.1cm}\forall i,$ with the requirement $\xi_i\geq 0$ .

Taking the derivatives with respect to $b$ and $\boldsymbol{w}$ we obtain $\frac{\partial {\cal L}}{\partial b} = -\sum_{i} \lambda_iy_i=0,$ and $\frac{\partial {\cal L}}{\partial \boldsymbol{w}} = 0 = \boldsymbol{w}-\sum_{i} \lambda_iy_i\boldsymbol{x}_i,$ and $\lambda_i = C-\gamma_i \hspace{0.1cm}\forall i.$ Inserting these constraints into the equation for ${\cal L}$ we obtain the same equation as before ${\cal L}=\sum_i\lambda_i-\frac{1}{2}\sum_{ij}^n\lambda_i\lambda_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j,$ but now subject to the constraints $\lambda_i\geq 0$ , $\sum_i\lambda_iy_i=0$ and $0\leq\lambda_i \leq C$ . We must in addition satisfy the Karush-Kuhn-Tucker condition which now reads $\lambda_i\left[y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) -(1-\xi_)\right]=0 \hspace{0.1cm}\forall i,$ $\gamma_i\xi_i = 0,$ and $y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b) -(1-\xi_) \geq 0 \hspace{0.1cm}\forall i.$