Different kernels and Mercer's theorem

There are several popular kernels being used. These are

  1. Linear: \( K(\boldsymbol{x},\boldsymbol{y})=\boldsymbol{x}^T\boldsymbol{y} \),
  2. Polynomial: \( K(\boldsymbol{x},\boldsymbol{y})=(\boldsymbol{x}^T\boldsymbol{y}+\gamma)^d \),
  3. Gaussian Radial Basis Function: \( K(\boldsymbol{x},\boldsymbol{y})=\exp{\left(-\gamma\vert\vert\boldsymbol{x}-\boldsymbol{y}\vert\vert^2\right)} \),
  4. Tanh: \( K(\boldsymbol{x},\boldsymbol{y})=\tanh{(\boldsymbol{x}^T\boldsymbol{y}+\gamma)} \),
and many other ones.

An important theorem for us is Mercer's theorem. The theorem states that if a kernel function \( K \) is symmetric, continuous and leads to a positive semi-definite matrix \( \boldsymbol{P} \) then there exists a function \( \phi \) that maps \( \boldsymbol{x}_i \) and \( \boldsymbol{x}_j \) into another space (possibly with much higher dimensions) such that $$ K(\boldsymbol{x}_i,\boldsymbol{x}_j)=\phi(\boldsymbol{x}_i)^T\phi(\boldsymbol{x}_j). $$ So you can use \( K \) as a kernel since you know \( \phi \) exists, even if you don’t know what \( \phi \) is. Note that some frequently used kernels (such as the Sigmoid kernel) don’t respect all of Mercer’s conditions, yet they generally work well in practice.