The equations

Suppose we define a polynomial transformation of degree two only (we continue to live in a plane with \( x_i \) and \( y_i \) as variables) $$ z = \phi(x_i) =\left(x_i^2, y_i^2, \sqrt{2}x_iy_i\right). $$

With our new basis, the equations we solved earlier are basically the same, that is we have now (without the slack option for simplicity) $$ {\cal L}=\sum_i\lambda_i-\frac{1}{2}\sum_{ij}^n\lambda_i\lambda_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{z}_j, $$ subject to the constraints \( \lambda_i\geq 0 \), \( \sum_i\lambda_iy_i=0 \), and for the support vectors $$ y_i(\boldsymbol{w}^T\boldsymbol{z}_i+b)= 1 \hspace{0.1cm}\forall i, $$ from which we also find \( b \). To compute \( \boldsymbol{z}_i^T\boldsymbol{z}_j \) we define the kerne \( K(\boldsymbol{x}_i,\boldsymbol{x}_j) \) as $$ K(\boldsymbol{x}_i,\boldsymbol{x}_j)=\boldsymbol{z}_i^T\boldsymbol{z}_j= \phi(\boldsymbol{x}_i)^T\phi(\boldsymbol{x}_j). $$ For the above example, the kernel reads $$ K(\boldsymbol{x}_i,\boldsymbol{x}_j)=[x_i^2, y_i^2, \sqrt{2}x_iy_i]^T\begin{bmatrix} x_j^2 \\ y_j^2 \\ \sqrt{2}x_jy_j \end{bmatrix}=x_i^2x_j^2+2x_ix_jy_iy_j+y_i^2y_j^2. $$

We note that this is nothing but the dot product of the two original vectors \( (\boldsymbol{x}_i^T\boldsymbol{x}_j)^2 \). Instead of thus computing the product in the Lagrangian of \( \boldsymbol{z}_i^T\boldsymbol{z}_j \) we simply compute the dot product \( (\boldsymbol{x}_i^T\boldsymbol{x}_j)^2 \). This leads to the so-called kernel trick and the result leads to the same as if we went through the trouble of performing the transformation \( (\phi(\boldsymbol{x}_i)^T\phi(\boldsymbol{x}_j) \) during the SVM calculations.