Data Analysis and Machine Learning: Support Vector Machines

Loading [MathJax]/extensions/TeX/boldsymbol.js

Different kernels and Mercer's theorem

There are several popular kernels being used. These are

Linear: $K(\boldsymbol{x},\boldsymbol{y})=\boldsymbol{x}^T\boldsymbol{y}$ ,
Polynomial: $K(\boldsymbol{x},\boldsymbol{y})=(\boldsymbol{x}^T\boldsymbol{y}+\gamma)^d$ ,
Gaussian Radial Basis Function: $K(\boldsymbol{x},\boldsymbol{y})=\exp{\left(-\gamma\vert\vert\boldsymbol{x}-\boldsymbol{y}\vert\vert^2\right)}$ ,
Tanh: $K(\boldsymbol{x},\boldsymbol{y})=\tanh{(\boldsymbol{x}^T\boldsymbol{y}+\gamma)}$ ,

and many other ones.

An important theorem for us is Mercer's theorem. The theorem states that if a kernel function $K$ is symmetric, continuous and leads to a positive semi-definite matrix $\boldsymbol{P}$ then there exists a function $\phi$ that maps $\boldsymbol{x}_i$ and $\boldsymbol{x}_j$ into another space (possibly with much higher dimensions) such that $K(\boldsymbol{x}_i,\boldsymbol{x}_j)=\phi(\boldsymbol{x}_i)^T\phi(\boldsymbol{x}_j).$ So you can use $K$ as a kernel since you know $\phi$ exists, even if you don’t know what $\phi$ is. Note that some frequently used kernels (such as the Sigmoid kernel) don’t respect all of Mercer’s conditions, yet they generally work well in practice.