Treat nan as zero in summing a numpy array, except nan in all arrays

Question

Treat nan as zero in summing a numpy array, except nan in all arrays

I have two arrays of numpy NS, EW to summarize. Each of them has missing values in different positions, for example

NS = array([[  1.,   2.,  nan],
       [  4.,   5.,  nan],
       [  6.,  nan,  nan]])
EW = array([[  1.,   2.,  nan],
       [  4.,  nan,  nan],
       [  6.,  nan,   9.]]

How can I do the numpy summation operation, which will treat nan as zero if one array has nan in the location and keeps nan if both arrays have nan in the same place.

The result that I expect to see

SUM = array([[  2.,   4.,  nan],
           [  8.,  5.,  nan],
           [  12.,  nan,   9.]])

When i try

SUM=np.add(NS,EW)

it gives me

SUM=array([[  2.,   4.,  nan],
       [  8.,  nan,  nan],
       [ 12.,  nan,  nan]])

When i try

SUM = np.nansum(np.dstack((NS,EW)),2)

it gives me

SUM=array([[  2.,   4.,   0.],
       [  8.,   5.,   0.],
       [ 12.,   0.,   9.]])

Of course, I can realize my goal by performing an operation at the element level,

for i in range(np.size(NS,0)):
    for j in range(np.size(NS,1)):
        if np.isnan(NS[i,j]) and np.isnan(EW[i,j]):
            SUM[i,j] = np.nan
        elif np.isnan(NS[i,j]):
            SUM[i,j] = EW[i,j]
        elif np.isnan(EW[i,j]):
            SUM[i,j] = NS[i,j]
        else:
            SUM[i,j] = NS[i,j]+EW[i,j]

but he is very slow. Therefore, I am looking for a more countless solution to solve this problem.

Thanks for the help in advance!

+4

python numpy nan missing-data

Superstar Feb 13 '17 at 17:21

3

nansum , nans :

def add_ignore_nans(a, b):
    stacked = np.array([a, b])
    res = np.nansum(stacked, axis=0)
    res[np.all(np.isnan(stacked), axis=0)] = np.nan
    return res

>>> add_ignore_nans(a, b)
array([[  2.,   4.,  nan],
       [  8.,   5.,  nan],
       [ 12.,  nan,   9.]])

, @Divakar , , !: -)

+2

MSeifert 13 . '17 18:50

, , , . a = NS b = EW:

na = numpy.isnan(a)
nb = numpy.isnan(b)
a[na] = 0
b[nb] = 0
a += b
na &= nb
a[na] = numpy.nan

, , , . - a.

+1

Benjamin 13 . '17 18:23

Divakar · Accepted Answer · 2017-02-13T17:27:32+0000

№ 1: np.where -

def sum_nan_arrays(a,b):
    ma = np.isnan(a)
    mb = np.isnan(b)
    return np.where(ma&mb, np.nan, np.where(ma,0,a) + np.where(mb,0,b))

-

In [43]: NS
Out[43]: 
array([[  1.,   2.,  nan],
       [  4.,   5.,  nan],
       [  6.,  nan,  nan]])

In [44]: EW
Out[44]: 
array([[  1.,   2.,  nan],
       [  4.,  nan,  nan],
       [  6.,  nan,   9.]])

In [45]: sum_nan_arrays(NS, EW)
Out[45]: 
array([[  2.,   4.,  nan],
       [  8.,   5.,  nan],
       [ 12.,  nan,   9.]])

№ 2: , boolean-indexing -

def sum_nan_arrays_v2(a,b):
    ma = np.isnan(a)
    mb = np.isnan(b)
    m_keep_a = ~ma & mb
    m_keep_b = ma & ~mb
    out = a + b
    out[m_keep_a] = a[m_keep_a]
    out[m_keep_b] = b[m_keep_b]
    return out

-

In [140]: # Setup input arrays with 4/9 ratio of NaNs (same as in the question)
     ...: a = np.random.rand(3000,3000)
     ...: b = np.random.rand(3000,3000)
     ...: a.ravel()[np.random.choice(range(a.size), size=4000000, replace=0)] = np.nan
     ...: b.ravel()[np.random.choice(range(b.size), size=4000000, replace=0)] = np.nan
     ...: 

In [141]: np.nanmax(np.abs(sum_nan_arrays(a, b) - sum_nan_arrays_v2(a, b))) # Verify
Out[141]: 0.0

In [142]: %timeit sum_nan_arrays(a, b)
10 loops, best of 3: 141 ms per loop

In [143]: %timeit sum_nan_arrays_v2(a, b)
10 loops, best of 3: 177 ms per loop

In [144]: # Setup input arrays with lesser NaNs
     ...: a = np.random.rand(3000,3000)
     ...: b = np.random.rand(3000,3000)
     ...: a.ravel()[np.random.choice(range(a.size), size=4000, replace=0)] = np.nan
     ...: b.ravel()[np.random.choice(range(b.size), size=4000, replace=0)] = np.nan
     ...: 

In [145]: np.nanmax(np.abs(sum_nan_arrays(a, b) - sum_nan_arrays_v2(a, b))) # Verify
Out[145]: 0.0

In [146]: %timeit sum_nan_arrays(a, b)
10 loops, best of 3: 69.6 ms per loop

In [147]: %timeit sum_nan_arrays_v2(a, b)
10 loops, best of 3: 38 ms per loop

Treat nan as zero in summing a numpy array, except nan in all arrays

More articles: