Multicollinearity

It occurs when two or more independent variables in dataset are correlated to each other or to the linear combination of two or more independent variables. Multicollinearity plays an important role during regression analysis.

As explained earlier, that regression coefficients are the weightage of impact of corresponding independent variable while predicting the value of dependent variable when other independent variables are kept constant.

But multicollinearity says that change in one independent variable may impact the value of other independent variable to which it is correlated. This statement basically contardicts the definition of regression coefficients calculation given above in red.

Thus multicollinearity may cause estimation of regression coeffiecients unreliable and may mislead our calculations. Hence it is advised if there is bit little multicollinearity, one should go ahead with the analysis of regression coefficients and if multicollinearity is great, one should deal with it first and then only should go to analysis.

Now, how will you check multicollinearity of any independent variable. For this, we calculate coefficient of multiple determination denoted as R2. for each independent variable.We will do so by regressing each independent variable once. Let’s see how all this works.

First, remove the actual dependent variable from the dataset i.e Price.

Now, we need to regress every ith independent variable (xi) over other independent variables i.e make xi as depdendent variable and use other independent variable to predict xi and calculate its coeffiecient of multiple determination (R2i) using our all-time favourite and the easiet to use tool EXCEL.

So, just open your daatset in excel. And go to tab ‘Data’ and click on ‘Data Analysis’ button.

A dialog box will appear. Select ‘Regression’ and click ‘OK’. Regression box will open.

Enter the range for your ne dependent variable in ‘Input Y range’ box and range for your rest independent variables in ‘Input X range’ box. Also, enter the column range where you ant output in ‘Output range’ box. Clcik ‘OK’.

In your output range column, you will get few details like Regression Model, Standard error, R2.,Slope and intercept. For our purpose , we need R2 as of now. So excel has done it for you without much efforts.

Please refer the excel sheet on below link to get R2 of every independent variable.

We have thumb rule to interpret R2 . If R2 is greater than 0.75 may cause potential problems but if it is greater than 0.9, it would become a serious problem and we need to redesign our data to eliminate it. But for our example, this is not the case since R2 for every indepdendent variable is far less than 0.75. So, we can go ahead.

We will come later with some other use cases where multicollinearity is greater than 0.75.

$${}$$