Data Analysis and Machine Learning: Support Vector Machines

Loading [MathJax]/extensions/TeX/boldsymbol.js

The problem to solve

Using our definition of the kernel We can rewrite again the Lagrangian

${\cal L}=\sum_i\lambda_i-\frac{1}{2}\sum_{ij}^n\lambda_i\lambda_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{z}_j,$ subject to the constraints

$\lambda_i\geq 0$ ,

$\sum_i\lambda_iy_i=0$ in terms of a convex optimization problem

$\frac{1}{2} \boldsymbol{\lambda}^T\begin{bmatrix} y_1y_1K(\boldsymbol{x}_1,\boldsymbol{x}_1) & y_1y_2K(\boldsymbol{x}_1,\boldsymbol{x}_2) & \dots & \dots & y_1y_nK(\boldsymbol{x}_1,\boldsymbol{x}_n) \\ y_2y_1K(\boldsymbol{x}_2,\boldsymbol{x}_1) & y_2y_2(\boldsymbol{x}_2,\boldsymbol{x}_2) & \dots & \dots & y_1y_nK(\boldsymbol{x}_2,\boldsymbol{x}_n) \\ \dots & \dots & \dots & \dots & \dots \\ \dots & \dots & \dots & \dots & \dots \\ y_ny_1K(\boldsymbol{x}_n,\boldsymbol{x}_1) & y_ny_2K(\boldsymbol{x}_n\boldsymbol{x}_2) & \dots & \dots & y_ny_nK(\boldsymbol{x}_n,\boldsymbol{x}_n) \\ \end{bmatrix}\boldsymbol{\lambda}-\mathbb{1}\boldsymbol{\lambda},$ subject to

$\boldsymbol{y}^T\boldsymbol{\lambda}=0$ . Here we defined the vectors

$\boldsymbol{\lambda} =[\lambda_1,\lambda_2,\dots,\lambda_n]$ and

$\boldsymbol{y}=[y_1,y_2,\dots,y_n]$ . If we add the slack constants this leads to the additional constraint

$0\leq \lambda_i \leq C$ .

We can rewrite this (see the solutions below) in terms of a convex optimization problem of the type $\begin{align*} &\mathrm{min}_{\lambda}\hspace{0.2cm} \frac{1}{2}\boldsymbol{\lambda}^T\boldsymbol{P}\boldsymbol{\lambda}+\boldsymbol{q}^T\boldsymbol{\lambda},\\ \nonumber &\mathrm{subject\hspace{0.1cm}to} \hspace{0.2cm} \boldsymbol{G}\boldsymbol{\lambda} \preceq \boldsymbol{h} \hspace{0.2cm} \wedge \boldsymbol{A}\boldsymbol{\lambda}=f. \end{align*}$ Below we discuss how to solve these equations. Here we note that the matrix $\boldsymbol{P}$ has matrix elements $p_{ij}=y_iy_jK(\boldsymbol{x}_i,\boldsymbol{x}_j)$ . Given a kernel $K$ and the targets $y_i$ this matrix is easy to set up. The constraint $\boldsymbol{y}^T\boldsymbol{\lambda}=0$ leads to $f=0$ and $\boldsymbol{A}=\boldsymbol{y}$ . How to set up the matrix $\boldsymbol{G}$ is discussed later. Here note that the inequalities $0\leq \lambda_i \leq C$ can be split up into $0\leq \lambda_i$ and $\lambda_i \leq C$ . These two inequalities define then the matrix $\boldsymbol{G}$ and the vector $\boldsymbol{h}$ .