Python pandas dataframe: populate nans with conditional average

I have the following framework:

import numpy as np import pandas as pd df = pd.DataFrame(data={'Cat' : ['A', 'A', 'A','B', 'B', 'A', 'B'], 'Vals' : [1, 2, 3, 4, 5, np.nan, np.nan]}) Cat Vals 0 A 1 1 A 2 2 A 3 3 B 4 4 B 5 5 A NaN 6 B NaN 

And I want indexes 5 and 6 filled with the conditional value "Vals" based on the column "Cat", namely 2 and 4.5

The following code works fine:

 means = df.groupby('Cat').Vals.mean() for i in df[df.Vals.isnull()].index: df.loc[i, 'Vals'] = means[df.loc[i].Cat] Cat Vals 0 A 1 1 A 2 2 A 3 3 B 4 4 B 5 5 A 2 6 B 4.5 

But I'm looking for something nicer, like

 df.Vals.fillna(df.Vals.mean(Conditionally to column 'Cat')) 

Edit: I found this one line shorter, but I'm still not happy:

 means = df.groupby('Cat').Vals.mean() df.Vals = df.apply(lambda x: means[x.Cat] if pd.isnull(x.Vals) else x.Vals, axis=1) 
+5
source share
1 answer

We want to β€œlink” Cat values ​​to missing NaN locations. In Pandas, such associations are always performed through an index. Therefore, it is natural to set Cat as an index:

 df = df.set_index(['Cat']) 

Once this is done, fillna will work as desired:

 df['Vals'] = df['Vals'].fillna(means) 

To return Cat to the column, you could, of course, use reset_index :

 df = df.reset_index() 

 import pandas as pd import numpy as np df = pd.DataFrame( {'Cat' : ['A', 'A', 'A','B', 'B', 'A', 'B'], 'Vals' : [1, 2, 3, 4, 5, np.nan, np.nan]}) means = df.groupby(['Cat'])['Vals'].mean() df = df.set_index(['Cat']) df['Vals'] = df['Vals'].fillna(means) df = df.reset_index() print(df) 

gives

  Cat Vals 0 A 1.0 1 A 2.0 2 A 3.0 3 B 4.0 4 B 5.0 5 A 2.0 6 B 4.5 
+4
source

Source: https://habr.com/ru/post/1234931/


All Articles