Python pandas dataframe: populate nans with conditional average

Question

Python pandas dataframe: populate nans with conditional average

I have the following framework:

import numpy as np import pandas as pd df = pd.DataFrame(data={'Cat' : ['A', 'A', 'A','B', 'B', 'A', 'B'], 'Vals' : [1, 2, 3, 4, 5, np.nan, np.nan]}) Cat Vals 0 A 1 1 A 2 2 A 3 3 B 4 4 B 5 5 A NaN 6 B NaN

And I want indexes 5 and 6 filled with the conditional value "Vals" based on the column "Cat", namely 2 and 4.5

The following code works fine:

 means = df.groupby('Cat').Vals.mean() for i in df[df.Vals.isnull()].index: df.loc[i, 'Vals'] = means[df.loc[i].Cat] Cat Vals 0 A 1 1 A 2 2 A 3 3 B 4 4 B 5 5 A 2 6 B 4.5

But I'm looking for something nicer, like

 df.Vals.fillna(df.Vals.mean(Conditionally to column 'Cat'))

Edit: I found this one line shorter, but I'm still not happy:

 means = df.groupby('Cat').Vals.mean() df.Vals = df.apply(lambda x: means[x.Cat] if pd.isnull(x.Vals) else x.Vals, axis=1)

+5

python pandas nan fill

Niourf Oct 31 '15 at 10:13

source share

1 answer

unutbu · Answer 1 · 2015-10-31T22:46:51+0000

We want to “link” Cat values to missing NaN locations. In Pandas, such associations are always performed through an index. Therefore, it is natural to set Cat as an index:

 df = df.set_index(['Cat'])

Once this is done, fillna will work as desired:

 df['Vals'] = df['Vals'].fillna(means)

To return Cat to the column, you could, of course, use reset_index :

 df = df.reset_index()

 import pandas as pd import numpy as np df = pd.DataFrame( {'Cat' : ['A', 'A', 'A','B', 'B', 'A', 'B'], 'Vals' : [1, 2, 3, 4, 5, np.nan, np.nan]}) means = df.groupby(['Cat'])['Vals'].mean() df = df.set_index(['Cat']) df['Vals'] = df['Vals'].fillna(means) df = df.reset_index() print(df)

gives

  Cat Vals 0 A 1.0 1 A 2.0 2 A 3.0 3 B 4.0 4 B 5.0 5 A 2.0 6 B 4.5

Python pandas dataframe: populate nans with conditional average

More articles: