SVM (Support Vector Machine) is a supervised machine learning algorithm which is mainly used to classify data into different classes.
SVM can be used for both classification and regression problems.Though it is mainly known for classification problems.Unlike most algorithms,SVM makes use of a hyperplane which acts like a decision boundary between the various classes.
If you closely look at Fig.1,there can be several boundaries(one solid line and three dashed line) that correctly divide the data points.
SVM differs from the other classification algorithms in the way that that it chooses the decision boundary that maximizes the distance from the nearest data points of all the classes.
SVM Algorithm Steps:
- Start by drawing a random hyperplane.
- Then check the distance between the hyperplane and the closest data points from each class.
- These closest data points to the hyperplane are known as support vectors.
- The hyperplane is drawn based on these support vectors will have a maximum distance from each of the support vectors.
- And this distance between the hyperplane and the support vectors is known as the margin.
- Select the hyperplane in which margin should be maximum,called maximum margin classifier.
This is how SVM works. Practical scenarios where SVM is applicable are :
- Face Detection.
- Classification of Images.
- Handwriting Recognition.
- BioInformatics-Cancer Recognition.
- Environmental Sciences-Geo Data analysis.
Therefore to sum up,SVM is a technique used to classify linear separable data on the basis of maximum distance between two strategically chosen vectors.The dataset looks somewhat like this
Now let us implement SVM linear svm classifier using Python Scikit-learn library and please find the github link to access the complete code.
But what if the problem dataset looks like at Fig-5.How can we draw a line through these points to divide them in two classes? We can’t,right?Therefore this kind of problem is called as Linear