Data Analysis and Machine Learning: Support Vector Machines

Loading [MathJax]/extensions/TeX/boldsymbol.js

The equations

Suppose we define a polynomial transformation of degree two only (we continue to live in a plane with $x_i$ and $y_i$ as variables) $z = \phi(x_i) =\left(x_i^2, y_i^2, \sqrt{2}x_iy_i\right).$

With our new basis, the equations we solved earlier are basically the same, that is we have now (without the slack option for simplicity) ${\cal L}=\sum_i\lambda_i-\frac{1}{2}\sum_{ij}^n\lambda_i\lambda_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{z}_j,$ subject to the constraints $\lambda_i\geq 0$ , $\sum_i\lambda_iy_i=0$ , and for the support vectors $y_i(\boldsymbol{w}^T\boldsymbol{z}_i+b)= 1 \hspace{0.1cm}\forall i,$ from which we also find $b$ . To compute $\boldsymbol{z}_i^T\boldsymbol{z}_j$ we define the kerne $K(\boldsymbol{x}_i,\boldsymbol{x}_j)$ as $K(\boldsymbol{x}_i,\boldsymbol{x}_j)=\boldsymbol{z}_i^T\boldsymbol{z}_j= \phi(\boldsymbol{x}_i)^T\phi(\boldsymbol{x}_j).$ For the above example, the kernel reads $K(\boldsymbol{x}_i,\boldsymbol{x}_j)=[x_i^2, y_i^2, \sqrt{2}x_iy_i]^T\begin{bmatrix} x_j^2 \\ y_j^2 \\ \sqrt{2}x_jy_j \end{bmatrix}=x_i^2x_j^2+2x_ix_jy_iy_j+y_i^2y_j^2.$

We note that this is nothing but the dot product of the two original vectors $(\boldsymbol{x}_i^T\boldsymbol{x}_j)^2$ . Instead of thus computing the product in the Lagrangian of $\boldsymbol{z}_i^T\boldsymbol{z}_j$ we simply compute the dot product $(\boldsymbol{x}_i^T\boldsymbol{x}_j)^2$ . This leads to the so-called kernel trick and the result leads to the same as if we went through the trouble of performing the transformation $(\phi(\boldsymbol{x}_i)^T\phi(\boldsymbol{x}_j)$ during the SVM calculations.