Earlier we discussed about how mean is calculated and useful for the Statistics and DataScience.

Here we will talk more about median, Lets start discussion on it.

### Median:

• Median is also important measure of central tendency.
• It also helps to represent our dataset in a single number.
• Median represents that value of the variable which divides the whole distribution into two equal parts.
• For calculating median, data should be arranged in either ascending or descending order of magnitude.

For odd number of observations,the median is the middle value of the data.

Median= (n+1)/2 term

Example: Find median for the given observations – 2,4,6,8,12,14,18

First check data is sorted order or not, answer is “yes” that data is already “sorted”. Now we are good to get median.
There are 7 observations, median is (7+1)/2 => 4th element.

Fourth element in the given data is 8 and we say "8" is the median for the dataset.

Given even number of observations, there will be two middle values and we take the arithmetic mean of these two middle values.

Median= ((n/2) th + ((n/2 )+1) th ) /2 term

Example : Find median for the given observations – 1,3,6,7,11,15

First check data is sorted order or not, answer is “yes” that data is already “sorted”. Now we are good to get median.
There are 6 observations, median is calculated by above formula (3rd element + 4th element)/2 => 13/2 => 6.5

## Python

import pandas as pd

df.show()    #showing top 5 records to validate the data

df.sepal_length_cm.mean()

# Output- 5.64


## R

iris_data <-read.csv("C:\\Users\\windows\\Desktop\\iris\\iris-data.csv ")
median(iris_data\$sepal_length_cm)

//Output-
5.7


## SAS

proc import out = test  datafile = 'c:\\users\\windows\\desktop\\iris\\iris-data.csv' dbms = csv replace ;
getname=yes;  /*to import iris dataset*/
run;
proc means data = test ; /*to calculate median of sepal_length*/
var sepal_length_cm;
output out= test_sepal median(sepal_length_cm)=med_sepal;
run;
proc print data = test_sepal ; /*to print median value*/
run;


### Merits of Median:

• It is not affected by extremely small or extremely large values, since it focuses on middle values only.
• Easy to understand and compute.
• It should be rigidly defined so as to avoid different people choosing different values for the same measure of central tendency.
• In some cases median gives better result than mean.
• We use median impute strategy in data preparation step for Data Science probem when there are outliers.

### Demerits of Median:

• For even number of observations we get only an estimate of the median. We don’t get its exact value since we are taking the mean of two middle values.
• Does not utilize all the observations. For example– The median of 5, 8, 9 is 8. If the observation 9 is replaced by any number higher than or equal to 8 and if the number 5 is replaced by any number lower than or equal to 8, the median value will be unaffected. This means 5 and 9 are not being utilized.
• Affected by sampling fluctuations. Sampling fluctuation is the extent to which a statistics takes on different values with different samples.

Few Interview questions on Median:

Question-1- Find the median of numbers 12,5,4,102,95 ?

Answer– First condition for calculating the median is numbers or data should be either in ascending or descending order. Here number is not sorted order so median is not possible for the given numbers.

Question-2- What is the most used strategies to fill NA value in Data Science or Statistics problem ?

Question-3– Is median affected by sampling fluctuations?

Answer– Yes ,we can get different median in every samples.

That’s all I have and thanks a lot for reading. Please let me know if any corrections/suggestions. Please do share and comments if you like the post. Thanks in advance… 😉

Thanks Jigyasa Srivastava for helping us to grow day by day. She has very good command on Statistics and loves to solve the analytical problem.

$${}$$