Pandas Fillna multiple columns in each column mode

When working with census data, I want to replace NaNs in two columns (“working class” and “home country”) with the corresponding modes of these two columns. I can easily get the modes:

mode = df.filter(["workclass", "native-country"]).mode() 

which returns a dataframe:

  workclass native-country 0 Private United-States 

but

 df.filter(["workclass", "native-country"]).fillna(mode) 

does not replace the NaN in each column with anything, not to mention the mode corresponding to that column. Is there a smooth way to do this?

+5
source share
2 answers

If you want to enter missing values ​​with mode in some dataframe df columns, you can simply fillna Series , created by selecting at the iloc position:

 cols = ["workclass", "native-country"] df[cols]=df[cols].fillna(df.mode().iloc[0]) 

Or:

 df[cols]=df[cols].fillna(mode.iloc[0]) 

Your choice:

 df[cols]=df.filter(cols).fillna(mode.iloc[0]) 

Example:

 df = pd.DataFrame({'workclass':['Private','Private',np.nan, 'another', np.nan], 'native-country':['United-States',np.nan,'Canada',np.nan,'United-States'], 'col':[2,3,7,8,9]}) print (df) col native-country workclass 0 2 United-States Private 1 3 NaN Private 2 7 Canada NaN 3 8 NaN another 4 9 United-States NaN mode = df.filter(["workclass", "native-country"]).mode() print (mode) workclass native-country 0 Private United-States cols = ["workclass", "native-country"] df[cols]=df[cols].fillna(df.mode().iloc[0]) print (df) col native-country workclass 0 2 United-States Private 1 3 United-States Private 2 7 Canada Private 3 8 United-States another 4 9 United-States Private 
+5
source

You can do it as follows:

 df[["workclass", "native-country"]]=df[["workclass", "native-country"]].fillna(value=mode.iloc[0]) 

For instance,

  import pandas as pd d={ 'key3': [1,4,4,4,5], 'key2': [6,6,4], 'key1': [6,4,4], } df=pd.DataFrame.from_dict(d,orient='index').transpose() 

Then df is

  key3 key2 key1 0 1 6 6 1 4 6 4 2 4 4 4 3 4 NaN NaN 4 5 NaN NaN 

Then by doing:

 l=df.filter(["key1", "key2"]).mode() df[["key1", "key2"]]=df[["key1", "key2"]].fillna(value=l.iloc[0]) 

we get that df is

  key3 key2 key1 0 1 6 6 1 4 6 4 2 4 4 4 3 4 6 4 4 5 6 4 
+2
source

Source: https://habr.com/ru/post/1265631/


All Articles