I have the following framework:
import numpy as np import pandas as pd df = pd.DataFrame(data={'Cat' : ['A', 'A', 'A','B', 'B', 'A', 'B'], 'Vals' : [1, 2, 3, 4, 5, np.nan, np.nan]}) Cat Vals 0 A 1 1 A 2 2 A 3 3 B 4 4 B 5 5 A NaN 6 B NaN
And I want indexes 5 and 6 filled with the conditional value "Vals" based on the column "Cat", namely 2 and 4.5
The following code works fine:
means = df.groupby('Cat').Vals.mean() for i in df[df.Vals.isnull()].index: df.loc[i, 'Vals'] = means[df.loc[i].Cat] Cat Vals 0 A 1 1 A 2 2 A 3 3 B 4 4 B 5 5 A 2 6 B 4.5
But I'm looking for something nicer, like
df.Vals.fillna(df.Vals.mean(Conditionally to column 'Cat'))
Edit: I found this one line shorter, but I'm still not happy:
means = df.groupby('Cat').Vals.mean() df.Vals = df.apply(lambda x: means[x.Cat] if pd.isnull(x.Vals) else x.Vals, axis=1)
source share