As we have already discussed in overview section that simple linear regression aims at finding the best fit line.
Consider the below plot between experience and salary.

In order to find the best lines to cover almost points in the above scatter plot, we use a metric called “Sum of Squared Errors(SSE)” and compare number of lines to find out best fit line that has minimum SSE. We will be using below equation to evaluate the errors:

SSE = \sum_{i=1}^{n}(y_i-\hat{y_i})^2 ———-(1)

Below is the linear equation used by simple linear regression

y = ax + b ———–(2)

In order to get the best fit line, we will find values of ‘a’ and ‘b’ in above equation in such a way so that SSE is minimum. For this purpose, we will use least square method.
According to least square method, SSE should be minimum.
Rephrase the equation (1) by putting the value of yi

\sum_{i=1}^{n}(y_i-(ax_i-b))^2

Above equation can be rewritten as

\sum_{i=1}^{n}((y_i-ax_i)-b)^2——(3)

Applying below mathematical formula on equation (2)
(a-b)2 = a2 + b2 -2a

(a-b)^2 = a^2 + b^2-2ab \sum_{i=1}^{n}((y_i-ax_i)^2+b^2-2(y_i – ax_i)*b)

Rewriting the above equation as

\sum_{i=1}^{n}((y_i-ax_i)^2-2(y_i – ax_i)*b) + \sum_{i=1}^{n}b^2 Since, \sum_{i=1}^{n}b^2 = nb^2 \sum_{i=1}^{n}((y_i-ax_i)^2-2(y_i – ax_i)*b) + {n}b^2

Bringing above equation in the form of a quadratic equation

{n}b^2+\sum_{i=1}^{n}(-2(y_i – ax_i)*b)+(y_i-ax_i)^2)—(4)

For any given quadratic equation like

f(n) = mn^2 + kn + c

minima point is given as

n = -\frac{k}{2m}

Equation (3) also represents the quadratic equation. And minima point for this equation can be given a

b= – \frac{\sum_{i=1}^{n}-2(y_i-ax_i)}{2n} b= \frac{\sum_{i=1}^{n}y_i}{n}-a\frac{\sum_{i=1}^nx_i}{n}

In the above expression, first term is the mean of ‘y’ values and second term is the product of ‘a’ and mean of ‘x’ values.
Simplifying the above equation as

b= y^- – ax^- where y^- = \frac{\sum_{i=1}^{n}y_i}{n} and x^- = \frac{\sum_{i=1}^nx_i}{n} \hat{y} = ax + b

Put value of b in equation

\hat{y} = ax + y^- -ax^- = a(x-x^-) + y^-

Put above calculated value of y in below equation

\sum_{i=1}^{n}(y_i-\hat{y_i})^2 =\sum_{i=1}^{n}(y_i-a(x-x^-) – y^-)^2 =\sum_{i=1}^{n}((y_i-y^-) -a(x-x^-))^2 =\sum_{i=1}^{n}((y_i-y^-)^2 -2a(x-x^-)(y_i-y^-) + a^2(x-x^-)^2 =\sum_{i=1}^{n}a^2(x-x^-)^2-2a(x-x^-)(y_i-y^-) + (y_i-y^-)^2 =\sum_{i=1}^{n}a^2(x-x^-)^2-2\sum_{i=1}^{n}a(x-x^-)(y_i-y^-)+\sum_{i=1}^{n}(y_i-y^-)^2

So, this is again a quadratic equation and minima point for the same is given by below formula

a= -\frac{-2\sum_{i=1}^{n}(x-x^-)(y_i-y^-)}{2\sum_{i=1}^{n}(x-x^-)^2} a= \frac{\sum_{i=1}^{n}(x-x^-)(y_i-y^-)}{\sum_{i=1}^{n}(x-x^-)^2} —–(5)

As we studied in Coefficient of Determination, correlation is given as

r= \frac{\sum(x-\hat{x})(y-\hat{y})/n}{(\sqrt{\sum(x-\hat{x}^2/n}))(\sqrt{\sum(y-\hat{y}^2/n}))} —-(6)

Standard deviation sx and sy for x and y are given as

s_x= \sqrt(\frac{\sum(x-x^-)^2}{n})–(7) s_x= \sqrt(\frac{\sum(y-y^-)^2}{n})–(8)

Combining the equations (5),(6),(7) and (8), equation (5) can be rewritten as

a= \frac{rs_xs_y}{s_x^2} a= \frac{rs_y}{s_x}

So the line for which SSE is minimum is

\hat{y}= ax + b ——(9)

where

a = \frac{rs_y}{s_x} and b = y^–ax^-

Equation (9) can be written as

\hat{y} = ax + y^–ax^- \hat{y}-y^- = a(x-x^-) \hat{y}-y^- = \frac{rs_y(x-x^-)}{s_y}

where

\frac{rs_y}{s_y}

is known as regression coefficient of y on x.

$${}$$