Median pandas dataframe

I have a DataFrame df :

 name count aaaa 2000 bbbb 1900 cccc 900 dddd 500 eeee 100 

I would like to see rows that are within 10 times of the median of the count column.

I tried df['count'].median() and got the median. But I don’t know how to move on. Can you suggest how I could use pandas / numpy for this.

Expected Result:

 name count distance from median aaaa 2000 ***** 

I can use any measure as the distance from the median (absolute deviation from the median, quantile, etc.).

+6
source share
2 answers

If you are looking for how to calculate the median absolute deviation -

 In [1]: df['dist'] = abs(df['count'] - df['count'].median()) In [2]: df Out[2]: name count dist 0 aaaa 2000 1100 1 bbbb 1900 1000 2 cccc 900 0 3 dddd 500 400 4 eeee 100 800 In [3]: df['dist'].median() Out[3]: 800.0 
+12
source

Median absolute deviation

enter image description here

for a column, one could also calculate using statsmodels.robust.scale.mad , which can also be assigned the normalization constant c , which in this case is only 1.

 >>> from statsmodels.robust.scale import mad >>> mad(df['count'], c=1) 800.0 
+1
source

Source: https://habr.com/ru/post/985626/


All Articles