Sometimes it is difficult to find the linear decision boundary for some classification problems and if we project the data into higher dimension then we will get the hyperplane that helps to classify the data .

But Mapping into High Dimension space is highly computational expensive operation.

Simple Example-

Suppose x= (x1, x2, x3) and y = (y1, y2, y3)

And f(x)=x1*x1, x1*x2, x1*x3, x2*x1, x2*x2, x2*x3, x3*x1, x3*x2, x3*x3
f(y)= y1*y1, y1*y2, y1*y3, y2*y1, y2*y2, y2*y3, y3*y1, y3*y2, y3*y3 then when we want to calculate product of two functions ?

Let us embed some numbers to understand more.

Let us suppose x= (1,2,3) and y = (1,2,3) and to calculate f(x)*f(y).

We need to find first f(x) and f(y).

f(x) = (1,2,3,2,4,6,3,6,9)
f(y)= (1,2,3,2,4,6,3,6,9)
f(x)*f(y)= (1+4+9+4+16+36+9+36+81) =196

So, we are making 3-dimension length to 9-dimension length to evaluate f(x) and f(y).

We have taken only 3 vectors in the current set and let us suppose if it has 300 vectors then imagine calculating the f(x),it will be highly computational expensive.

Now using kernel tricks:

K=(x*y)2 =(1+2+9)2=196

This calculation is much easier. The standard method of calculating the dot product requires O(n2) time.But, kernel requires just O(n) time.
Kernel helps to classify the inseparable linear data with minimal computation./p>

Some commonly Used Kernel function –

1. Linear Kernel:
The Linear kernel is the simplest kernel function. It is given by the inner product plus an optional constant c.

2. Polynomial Kernel:

xi and xj are vectors in the input space and d is the degree of the polynomial.

||x-y|| is euclidean distance between vector and landmark, if vector is near to landmark then numerator is approximately equal to zero ,overall kernel value become one.
||x-y||-If vector is far away from the landmark then euclidean distance is higher,negative exponential makes the kernel function to be zero.

Sigma plays a crucial role in the performance of the kernel and it should be carefully tuned. When sigma is high then curve of decision boundary is low and when sigma is high then curve of the decision boundary is high.

4.Hyperbolic Tangent Kernel(Sigmoid):
The Hyperbolic Tangent Kernel is also known as the sigmoid kernel and as the Multilayer Perceptron (MLP) Kernel. The Sigmoid Kernel comes from the Neural Networks field. K is slope and c is intercept constant .

There are others kernel Power Kernel as well, Laplacian Kernel,Anova Kernel,Bessel,Wave Kernel, Log Kernel, Circular Kernel, Spherical Kernel, Multiquadratic kernel, Inverse Multiquadratic kernel, Cauchy Kernel, Chi-Square Kernel, Spline Kernel, B-Spline Kernel, Bayesian Kernel, Wavelet Kernel.

How does one decide to choose kernel?

We basically have to try several different Kernels and cross validate the performance of different kernels and use the kernel which has better performance for a given problem.

Now let us implement SVM kernel classifier using Python Scikit-learn library and please find the github link to access the complete code.

$${}$$