In last article we learned about the types of machine learning algorithm.

Simple linear regression.

We learnt that in———– y as a prediction variable depends on data fed to the system (x) according to following equation.

Y=f(x)

To explain in detail actually the relation of x and y is

y = a + b (x).

This equation looks familiar. Does it? That’s because we have all learnt it back in school time.

This is an equation of a sloped line drawn on X-Y axis. For eg. If X is the number of likes of a person on facebook, Y will be the popularity of index for him.

In our case Y is a dependent variable. Something that we are trying to predict. X is an independent variable. We are trying to predict the results for this dataset.

Simple linear regression involves only one independent variable. Here the dependent variable might or might not directly depend on independent variable(X) but it has a definite effect on (Y). We are trying to figure out this effect or in this case let’s call it association. A unit change in X is associated with Y.

If we consider the data of professionals, the salary increases with experience. More the experience higher the salary. If we plot few of these combinations on X-Y axis, X being Years of Experience and Y being the Salary they draw, we get following result. Our aim is to draw a best fitting line with reference to all those points.

Therefore in terms of simple linear regression , we can say

Salary = a + b* Experience

Now you must be wondering what is the role of A and B in this equation. In the terms of maths we call them ‘Constant’ and ‘Coefficient’ respectively.

The point where the line drawn by us touches Y axis is called a Constant. In our example ‘a’ is the salary a fresher should get. And ‘b’ is the slope of the line. More the slope, higher the salary a person can draw with increasing year of experience.

So the motto of this technique is to draw a line which best fits the data to keep the prediction as precise as possible.

In order to achieve this lets draw some vertical lines from the points we have mapped according to our data to the best fitting line i.e. modelled line.

Here you can see that the mapped point is above the trend line. That means this person is earning more that what he is predicted to. Lucky him! So the upper point is the actual value and lower point is modelled value i.e. value predicted by the model.

The difference between both points denoted the error in drawing the modelled line. Now you can see that there are some points below the model line as well. Therefore, the difference between these two points can be negative. To tackle this possibility, simple linear regression emphasizes on minimizing the sum of squares of these line lengths.

The simple linear regression draws multiple lines and for every line it calculates the sum of squares of the actual data from model line. It records the sum temporarily, and finds the minimum from those sums. Thus, it draws the best fitting line.

That’s how a simple linear regression works. Pretty simple, right?

Well, we look forward to meet you in next article.

$${}$$