Pandas DataFrame: replace nan values ​​with average number of columns

I have a pandas DataFrame filled mostly with real numbers, but it has several nan values ​​in it.

How to replace nan with column averages where they are?

This question is very similar to this one: numpy array: replace nan values ​​with the average number of columns , but unfortunately the solution given there does not work for pandas DataFrame.

+100
python pandas nan
Sep 08 '13 at 23:54 on
source share
8 answers

You can simply use DataFrame.fillna to populate nan directly:

 In [27]: df Out[27]: ABC 0 -0.166919 0.979728 -0.632955 1 -0.297953 -0.912674 -1.365463 2 -0.120211 -0.540679 -0.680481 3 NaN -2.027325 1.533582 4 NaN NaN 0.461821 5 -0.788073 NaN NaN 6 -0.916080 -0.612343 NaN 7 -0.887858 1.033826 NaN 8 1.948430 1.025011 -2.982224 9 0.019698 -0.795876 -0.046431 In [28]: df.mean() Out[28]: A -0.151121 B -0.231291 C -0.530307 dtype: float64 In [29]: df.fillna(df.mean()) Out[29]: ABC 0 -0.166919 0.979728 -0.632955 1 -0.297953 -0.912674 -1.365463 2 -0.120211 -0.540679 -0.680481 3 -0.151121 -2.027325 1.533582 4 -0.151121 -0.231291 0.461821 5 -0.788073 -0.231291 -0.530307 6 -0.916080 -0.612343 -0.530307 7 -0.887858 1.033826 -0.530307 8 1.948430 1.025011 -2.982224 9 0.019698 -0.795876 -0.046431 

fillna fillna says the value should be scalar or legible, however, it looks like it works with Series . If you want to pass a dict, you can use df.mean().to_dict() .

+178
Sep 09 '13 at 5:27
source share

Try:

 sub2['income'].fillna((sub2['income'].mean()), inplace=True) 
+32
Oct. 16 '15 at 20:18
source share
 In [16]: df = DataFrame(np.random.randn(10,3)) In [17]: df.iloc[3:5,0] = np.nan In [18]: df.iloc[4:6,1] = np.nan In [19]: df.iloc[5:8,2] = np.nan In [20]: df Out[20]: 0 1 2 0 1.148272 0.227366 -2.368136 1 -0.820823 1.071471 -0.784713 2 0.157913 0.602857 0.665034 3 NaN -0.985188 -0.324136 4 NaN NaN 0.238512 5 0.769657 NaN NaN 6 0.141951 0.326064 NaN 7 -1.694475 -0.523440 NaN 8 0.352556 -0.551487 -1.639298 9 -2.067324 -0.492617 -1.675794 In [22]: df.mean() Out[22]: 0 -0.251534 1 -0.040622 2 -0.841219 dtype: float64 

Apply for each column the average of these columns and fill in

 In [23]: df.apply(lambda x: x.fillna(x.mean()),axis=0) Out[23]: 0 1 2 0 1.148272 0.227366 -2.368136 1 -0.820823 1.071471 -0.784713 2 0.157913 0.602857 0.665034 3 -0.251534 -0.985188 -0.324136 4 -0.251534 -0.040622 0.238512 5 0.769657 -0.040622 -0.841219 6 0.141951 0.326064 -0.841219 7 -1.694475 -0.523440 -0.841219 8 0.352556 -0.551487 -1.639298 9 -2.067324 -0.492617 -1.675794 
+19
Sep 09 '13 at 0:15
source share
 # To read data from csv file Dataset = pd.read_csv('Data.csv') # To divide input in X and y axis X = Dataset.iloc[:, :-1].values Y = Dataset.iloc[:, 3].values # To calculate mean use imputer class from sklearn.preprocessing import Imputer imputer = Imputer(missing_values='NaN', strategy='mean', axis=0) imputer = imputer.fit(X[:, 1:3]) X[:, 1:3] = imputer.transform(X[:, 1:3]) 
+9
Jul 10 '17 at 18:54
source share

If you want to impute missing values ​​with an average value and move from column to column, then this will impute only the average value of this column. It might be a little more readable.

 sub2['income'] = sub2['income'].fillna((sub2['income'].mean())) 
+7
Feb 26 '17 at 3:15
source share

Another option, besides the above, is as follows:

 df = df.groupby(df.columns, axis = 1).transform(lambda x: x.fillna(x.mean())) 

It is less elegant than the previous answers for the average, but it may be shorter if you want to replace zeros with another column function.

+6
Nov 15 '16 at 19:40
source share

Use df.fillna(df.mean()) directly to fill all zero with a mean

If you want to fill the zero value with the average value of this column, you can use this

suppose x=df['Item_Weight'] here Item_Weight is the name of the column

here we assign (fill zero values ​​x with the average value x in x)

 df['Item_Weight'] = df['Item_Weight'].fillna((df['Item_Weight'].mean())) 

If you want to fill the null value with some string, use

here Outlet_size is the column name

 df.Outlet_Size = df.Outlet_Size.fillna('Missing') 
+4
Jun 27 '18 at 22:19
source share

Pandas: How to replace the values ​​of NaN ( nan ) with the average (average), median or other statistics of one column

Let's say your DataFrame is df and you have one column named nr_items . This is: df['nr_items']

If you want to replace the NaN values ​​of your df['nr_items'] column with the average column value :

Use the .fillna method:

mean_value=df['nr_items].mean()
df['nr_item_ave']=df['nr_items].fillna(mean_value)

I created a new df column named nr_item_ave to store a new column with NaN values ​​replaced by mean column value.

You must be careful when using mean . If you have emissions, median recommended.

+2
Feb 04 '19 at 14:02
source share



All Articles