Pandas DataFrame: replace nan values with average number of columns

Question

Pandas DataFrame: replace nan values with average number of columns

I have a pandas DataFrame filled mostly with real numbers, but it has several nan values in it.

How to replace nan with column averages where they are?

This question is very similar to this one: numpy array: replace nan values with the average number of columns , but unfortunately the solution given there does not work for pandas DataFrame.

+100

python pandas nan

piokuc Sep 08 '13 at 23:54 on

source share

8 answers

Try:

 sub2['income'].fillna((sub2['income'].mean()), inplace=True)

+32

Ammar Shigri Oct. 16 '15 at 20:18

source share

 In [16]: df = DataFrame(np.random.randn(10,3)) In [17]: df.iloc[3:5,0] = np.nan In [18]: df.iloc[4:6,1] = np.nan In [19]: df.iloc[5:8,2] = np.nan In [20]: df Out[20]: 0 1 2 0 1.148272 0.227366 -2.368136 1 -0.820823 1.071471 -0.784713 2 0.157913 0.602857 0.665034 3 NaN -0.985188 -0.324136 4 NaN NaN 0.238512 5 0.769657 NaN NaN 6 0.141951 0.326064 NaN 7 -1.694475 -0.523440 NaN 8 0.352556 -0.551487 -1.639298 9 -2.067324 -0.492617 -1.675794 In [22]: df.mean() Out[22]: 0 -0.251534 1 -0.040622 2 -0.841219 dtype: float64

Apply for each column the average of these columns and fill in

 In [23]: df.apply(lambda x: x.fillna(x.mean()),axis=0) Out[23]: 0 1 2 0 1.148272 0.227366 -2.368136 1 -0.820823 1.071471 -0.784713 2 0.157913 0.602857 0.665034 3 -0.251534 -0.985188 -0.324136 4 -0.251534 -0.040622 0.238512 5 0.769657 -0.040622 -0.841219 6 0.141951 0.326064 -0.841219 7 -1.694475 -0.523440 -0.841219 8 0.352556 -0.551487 -1.639298 9 -2.067324 -0.492617 -1.675794

+19

Jeff Sep 09 '13 at 0:15

source share

 # To read data from csv file Dataset = pd.read_csv('Data.csv') # To divide input in X and y axis X = Dataset.iloc[:, :-1].values Y = Dataset.iloc[:, 3].values # To calculate mean use imputer class from sklearn.preprocessing import Imputer imputer = Imputer(missing_values='NaN', strategy='mean', axis=0) imputer = imputer.fit(X[:, 1:3]) X[:, 1:3] = imputer.transform(X[:, 1:3])

+9

Roshan jha Jul 10 '17 at 18:54

source share

If you want to impute missing values with an average value and move from column to column, then this will impute only the average value of this column. It might be a little more readable.

 sub2['income'] = sub2['income'].fillna((sub2['income'].mean()))

+7

Pranay Aryal Feb 26 '17 at 3:15

source share

Another option, besides the above, is as follows:

 df = df.groupby(df.columns, axis = 1).transform(lambda x: x.fillna(x.mean()))

It is less elegant than the previous answers for the average, but it may be shorter if you want to replace zeros with another column function.

+6

guibor Nov 15 '16 at 19:40

source share

Use df.fillna(df.mean()) directly to fill all zero with a mean

If you want to fill the zero value with the average value of this column, you can use this

suppose x=df['Item_Weight'] here Item_Weight is the name of the column

here we assign (fill zero values x with the average value x in x)

 df['Item_Weight'] = df['Item_Weight'].fillna((df['Item_Weight'].mean()))

If you want to fill the null value with some string, use

here Outlet_size is the column name

 df.Outlet_Size = df.Outlet_Size.fillna('Missing')

+4

Sunny Barnwal Jun 27 '18 at 22:19

source share

Pandas: How to replace the values of NaN ( nan ) with the average (average), median or other statistics of one column

Let's say your DataFrame is df and you have one column named nr_items . This is: df['nr_items']

If you want to replace the NaN values of your df['nr_items'] column with the average column value :

Use the .fillna method:

mean_value=df['nr_items].mean() df['nr_item_ave']=df['nr_items].fillna(mean_value)

I created a new df column named nr_item_ave to store a new column with NaN values replaced by mean column value.

You must be careful when using mean . If you have emissions, median recommended.

+2

pink.slash Feb 04 '19 at 14:02

source share

bmu · Accepted Answer · 2013-09-09 05:27

You can simply use DataFrame.fillna to populate nan directly:

 In [27]: df Out[27]: ABC 0 -0.166919 0.979728 -0.632955 1 -0.297953 -0.912674 -1.365463 2 -0.120211 -0.540679 -0.680481 3 NaN -2.027325 1.533582 4 NaN NaN 0.461821 5 -0.788073 NaN NaN 6 -0.916080 -0.612343 NaN 7 -0.887858 1.033826 NaN 8 1.948430 1.025011 -2.982224 9 0.019698 -0.795876 -0.046431 In [28]: df.mean() Out[28]: A -0.151121 B -0.231291 C -0.530307 dtype: float64 In [29]: df.fillna(df.mean()) Out[29]: ABC 0 -0.166919 0.979728 -0.632955 1 -0.297953 -0.912674 -1.365463 2 -0.120211 -0.540679 -0.680481 3 -0.151121 -2.027325 1.533582 4 -0.151121 -0.231291 0.461821 5 -0.788073 -0.231291 -0.530307 6 -0.916080 -0.612343 -0.530307 7 -0.887858 1.033826 -0.530307 8 1.948430 1.025011 -2.982224 9 0.019698 -0.795876 -0.046431

fillna fillna says the value should be scalar or legible, however, it looks like it works with Series . If you want to pass a dict, you can use df.mean().to_dict() .

Pandas DataFrame: replace nan values ​​with average number of columns

More articles:

Pandas DataFrame: replace nan values with average number of columns