Loading [MathJax]/extensions/TeX/boldsymbol.js

 

 

 

The equations

Suppose we define a polynomial transformation of degree two only (we continue to live in a plane with x_i and y_i as variables) z = \phi(x_i) =\left(x_i^2, y_i^2, \sqrt{2}x_iy_i\right).

With our new basis, the equations we solved earlier are basically the same, that is we have now (without the slack option for simplicity) {\cal L}=\sum_i\lambda_i-\frac{1}{2}\sum_{ij}^n\lambda_i\lambda_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{z}_j, subject to the constraints \lambda_i\geq 0 , \sum_i\lambda_iy_i=0 , and for the support vectors y_i(\boldsymbol{w}^T\boldsymbol{z}_i+b)= 1 \hspace{0.1cm}\forall i, from which we also find b . To compute \boldsymbol{z}_i^T\boldsymbol{z}_j we define the kerne K(\boldsymbol{x}_i,\boldsymbol{x}_j) as K(\boldsymbol{x}_i,\boldsymbol{x}_j)=\boldsymbol{z}_i^T\boldsymbol{z}_j= \phi(\boldsymbol{x}_i)^T\phi(\boldsymbol{x}_j). For the above example, the kernel reads K(\boldsymbol{x}_i,\boldsymbol{x}_j)=[x_i^2, y_i^2, \sqrt{2}x_iy_i]^T\begin{bmatrix} x_j^2 \\ y_j^2 \\ \sqrt{2}x_jy_j \end{bmatrix}=x_i^2x_j^2+2x_ix_jy_iy_j+y_i^2y_j^2.

We note that this is nothing but the dot product of the two original vectors (\boldsymbol{x}_i^T\boldsymbol{x}_j)^2 . Instead of thus computing the product in the Lagrangian of \boldsymbol{z}_i^T\boldsymbol{z}_j we simply compute the dot product (\boldsymbol{x}_i^T\boldsymbol{x}_j)^2 . This leads to the so-called kernel trick and the result leads to the same as if we went through the trouble of performing the transformation (\phi(\boldsymbol{x}_i)^T\phi(\boldsymbol{x}_j) during the SVM calculations.