A measure of central tendency is a summary statistic that represents the center point of the dataset. As the name suggest, it is the tendency of data to cluster around center value. These measures indicate where most values in a distribution fall and are also referred to as the central location of a distribution.

In Statistics, three most common measures of central tendency are the mean, median and mode. Lets discuss each one by one.

### Mean:

• It helps us to represent our data in a single number.
• It is defined as the sum of all observation divided by total number of observations.
• It is the most used measure of central tendency.

Let’s say we have marks of 10 students as 21,23,12,34,23,13,42,23,12,10.Now finding out the mean of those numbers-

we calculate mean = (Total sum of the number/ count of the numbers of marks) =>213/10=> 21.3.

## Python

import pandas as pd

df.show()    #showing top 5 records to validate the data

df.sepal_length_cm.mean()

# Output- 5.64


## R

iris_data <-read.csv("C:\\Users\\windows\\Desktop\\iris\\iris-data.csv ")

mean(iris_data\$sepal_length_cm)

//Output- 5.64


## SAS

proc import out = test  datafile = “c:\\users\\windows\\desktop\\iris\\iris-data.csv” dbms = csv replace ;

getname=yes;  /*to import dataset in  SAS*/
run;

proc print data = test; /*to print the dataset*/
run;

proc means data= test; /*to find mean*/
run;


### Merits of Mean:

• It can be easily calculated; and can be easily understood. It is the reason that it is the most used measure of central tendency.
• It should be rigidly defined so as to avoid different people choosing different values for the same measure of central tendency.
• As every item is taken in calculation, it is effected by every item.
• It should be least affected by sampling fluctuation.

We used term Sampling Fluctuation means variation. Every sample is different as we are selecting only a few units from the entire population for the purpose of some study.

So the statistics (mean, median, mode, s.d. etc) from one sample may differ from that of other sample. This variation is known as Fluctuation in sample.And mean is considered as less affected by sampling fluctuation.

### Demerits of Mean:

• A single item can bring big change in the result- For example if there are three terms 4, 5, 6 ; mean is 5 in this case. If we add a new term 81, the new Mean is (4+5+6+81)/4 = 96/4 = 24. This is a big change as compared to the previous one.
• When we have outliers exist in the dataset, don’t use mean as a imputed strategy to fill “na”.
• Can’t find the mean other than numerical values.
• Sometimes it gives laughable conclusions, e.g. if there are 30, 40 and 55 students in three classes then average number of students is (30+40+55)/3 = 41.6, which is impossible as students can’t be in fractions.

Few Interview questions on Mean:

Question-1- Can we use Mean for non numerical values ? Can it impacted by extreme values?

Answer- No, we can’t use mean for non numerical values. Yes Mean is impacted by extreme values.Lets discuss same example again-
For example if there are three terms 4, 5, 6 ; mean is 5 in this case. If we add a new term 81, the new Mean is (4+5+6+81)/4 = 96/4 = 24. This is a big change as compared to the previous one.

Question-2 What do you mean by sampling fluctuation? And how mean is least affected by it.

Answer- When a random experiment is repeated number of times and its set of observations are noted.This is called Sampling Fluctuations.
As the set of observations may differ in each trial but the degree of variation is very small ,therefore the sum of observations for each trial will not differ by large values, Hence Mean is least affected by fluctuations of sampling.

That’s all I have and thanks a lot for reading. Please let me know if any corrections/suggestions. Please do share and comments if you like the post. Thanks in advance… 😉

Thanks Jigyasa Srivastava for helping us to grow day by day. She has very good command on Statistics and loves to solve the analytical problem.

$${}$$