Pandas: How to fill null values ​​with groupby mean?

I have a dataset that will be missing:

id category value 1 A NaN 2 B NaN 3 A 10.5 4 C NaN 5 A 2.0 6 B 1.0 

I need to fill with zeros to use the data in the model. Each time a category appears for the first time, it is NULL. What I want to do is cases like categories A and B , which have more than one value, replace zeros with the average value of this category. And for category C only one case, just fill in the average of the rest of the data.

I know that I can just do this for cases like C to get the average value for all strings, but I'm stuck trying to make category methods on A and B and replacing zeros.

 df['value'] = df['value'].fillna(df['value'].mean()) 

I need the final df to be like that

 id category value 1 A 6.25 2 B 1.0 3 A 10.5 4 C 4.15 5 A 2.0 6 B 1.0 
+5
source share
1 answer

I think you can use groupby and apply fillna with mean . Then get NaN if any category has only NaN values, so use mean all the column values ​​to fill in NaN :

 df.value = df.groupby('category')['value'].apply(lambda x: x.fillna(x.mean())) df.value = df.value.fillna(df.value.mean()) print (df) id category value 0 1 A 6.25 1 2 B 1.00 2 3 A 10.50 3 4 C 4.15 4 5 A 2.00 5 6 B 1.00 
+5
source

Source: https://habr.com/ru/post/1258924/


All Articles