Python- pandas Replace NA with median or group average in the data area

Suppose df:

AB apple 1.0 apple 2.0 apple NA orange NA orange 7.0 melon 14.0 melon NA melon 15.0 melon 16.0 

to replace NA, we can use df ["B"]. fillna (df ["B"]. median ()), but it will fill NA with the median of all the data in "B"

Is it possible to use the median of some A to replace NA (for example, below):

  AB apple 1.0 apple 2.0 apple **1.5** orange **7.0** orange 7.0 melon 14.0 melon **15.0** melon 15.0 melon 16.0 

Thanks!

+5
source share
2 answers

In pandas you can use transform to get zero fill values:

 >>> med = df.groupby('A')['B'].transform('median') >>> df['B'].fillna(med) 0 1.0 1 2.0 2 1.5 3 7.0 4 7.0 5 14.0 6 15.0 7 15.0 8 16.0 Name: B, dtype: float64 
+6
source

In R you can use na.aggregate/data.table to replace the value of NA with mean groups. We convert "data.frame" to "data.table" ( setDT(df) ), grouped by "A", apply na.aggregate to "B".

 library(zoo) library(data.table) setDT(df)[, B:= na.aggregate(B), A] df # AB #1: apple 1.0 #2: apple 2.0 #3: apple 1.5 #4: orange 7.0 #5: orange 7.0 #6: melon 14.0 #7: melon 15.0 #8: melon 15.0 #9: melon 16.0 
+2
source

Source: https://habr.com/ru/post/1235388/


All Articles