Using masked arrays
The standard way to do this using only numpy is to use the masked array module.
Scipy is a pretty heavy package that relies on external libraries, so you should have the numpy-only method. This is borrowed from @ DonaldHobson's answer.
Edit: np.nanmean now a numpy function. However, it does not process all-nan columns ...
Suppose you have an array a :
>>> a array([[ 0., nan, 10., nan], [ 1., 6., nan, nan], [ 2., 7., 12., nan], [ 3., 8., nan, nan], [ nan, 9., 14., nan]]) >>> import numpy.ma as ma >>> np.where(np.isnan(a), ma.array(a, mask=np.isnan(a)).mean(axis=0), a) array([[ 0. , 7.5, 10. , 0. ], [ 1. , 6. , 12. , 0. ], [ 2. , 7. , 12. , 0. ], [ 3. , 8. , 12. , 0. ], [ 1.5, 9. , 14. , 0. ]])
Please note that the value of the masked array should not be the same shape as a , because we use implicit broadcasting line by line.
Also note how well the all-nan column handles. The average value is zero, since you are taking the average value of the null elements. The nanmean does not process all-nan columns:
>>> col_mean = np.nanmean(a, axis=0) /home/praveen/.virtualenvs/numpy3-mkl/lib/python3.4/site-packages/numpy/lib/nanfunctions.py:675: RuntimeWarning: Mean of empty slice warnings.warn("Mean of empty slice", RuntimeWarning) >>> inds = np.where(np.isnan(a)) >>> a[inds] = np.take(col_mean, inds[1]) >>> a array([[ 0. , 7.5, 10. , nan], [ 1. , 6. , 12. , nan], [ 2. , 7. , 12. , nan], [ 3. , 8. , 12. , nan], [ 1.5, 9. , 14. , nan]])
Explanation
Converting a to a masked array gives you
>>> ma.array(a, mask=np.isnan(a)) masked_array(data = [[0.0 -- 10.0 --] [1.0 6.0 -- --] [2.0 7.0 12.0 --] [3.0 8.0 -- --] [-- 9.0 14.0 --]], mask = [[False True False True] [False False True True] [False False False True] [False False True True] [ True False False True]], fill_value = 1e+20)
And taking the middle columns, you will get the correct answer, normalizing only for open values:
>>> ma.array(a, mask=np.isnan(a)).mean(axis=0) masked_array(data = [1.5 7.5 12.0 --], mask = [False False False True], fill_value = 1e+20)
Also, pay attention to how the mask handles the column perfectly, which is all-nan!
Finally, np.where does the replacement.
Average value
To replace nan values with a meaningful row value, rather than a column average, a small change is required for broadcasting:
>>> a array([[ 0., 1., 2., 3., nan], [ nan, 6., 7., 8., 9.], [ 10., nan, 12., nan, 14.], [ nan, nan, nan, nan, nan]]) >>> np.where(np.isnan(a), ma.array(a, mask=np.isnan(a)).mean(axis=1), a) ValueError: operands could not be broadcast together with shapes (4,5) (4,) (4,5) >>> np.where(np.isnan(a), ma.array(a, mask=np.isnan(a)).mean(axis=1)[:, np.newaxis], a) array([[ 0. , 1. , 2. , 3. , 1.5], [ 7.5, 6. , 7. , 8. , 9. ], [ 10. , 12. , 12. , 12. , 14. ], [ 0. , 0. , 0. , 0. , 0. ]])