How to calculate sparsity percentage for numpy array / matrix?

Question

How to calculate sparsity percentage for numpy array / matrix?

I have the following matrix / matrix of size 10 by 5, which has several values NaN:

array([[  0.,   0.,   0.,   0.,   1.],
       [  1.,   1.,   0.,  nan,  nan],
       [  0.,  nan,   1.,  nan,  nan],
       [  1.,   1.,   1.,   1.,   0.],
       [  0.,   0.,   0.,   1.,   0.],
       [  0.,   0.,   0.,   0.,  nan],
       [ nan,  nan,   1.,   1.,   1.],
       [  0.,   1.,   0.,   1.,   0.],
       [  1.,   0.,   1.,   0.,   0.],
       [  0.,   1.,   0.,   0.,   0.]])

How to determine exactly how sparse this array is? Is there a simple function in numpy to measure the percentage of missing values?

+4

python arrays numpy matrix sparse-matrix

ShanZhengYang Aug 1 '16 at 21:44

source share

3 answers

"hpaulj".

, , ...

, . , X, :

non_zero = np.count_nonzero(X)

X:

total_val = np.product(X.shape)

-

sparsity = (total_val - non_zero)/total_val

-

density = non_zero/total_val

100%...

+1

Arun Kumar Khattri 25 . '18 14:28

Definition:

Code for the general case without NaNs:

from numpy import array
from numpy import count_nonzero

# create dense matrix
A = array([[1, 1, 0, 1, 0, 0], [1, 0, 2, 0, 0, 1], [99, 0, 0, 2, 0, 0]])

# calculate sparsity
sparsity = 1.0 - ( count_nonzero(A) / float(A.size) )
print(sparsity)

Results:

0.555555555556

+1

seralouk Jan 30 '19 at 2:04

source share

hpaulj · Accepted Answer · 2016-08-01T22:25:58+0000

np.isnan(a).sum()

gives the number of values nanin this example 8.

np.prod(a.shape)

- the number of values, here 50. Their ratio should give the desired value.

In [1081]: np.isnan(a).sum()/np.prod(a.shape)
Out[1081]: 0.16

You might also find it useful to make from a masked array

In [1085]: a_ma=np.ma.masked_invalid(a)
In [1086]: print(a_ma)
[[0.0 0.0 0.0 0.0 1.0]
 [1.0 1.0 0.0 -- --]
 [0.0 -- 1.0 -- --]
 [1.0 1.0 1.0 1.0 0.0]
 [0.0 0.0 0.0 1.0 0.0]
 [0.0 0.0 0.0 0.0 --]
 [-- -- 1.0 1.0 1.0]
 [0.0 1.0 0.0 1.0 0.0]
 [1.0 0.0 1.0 0.0 0.0]
 [0.0 1.0 0.0 0.0 0.0]]

Number of valid values:

In [1089]: a_ma.compressed().shape
Out[1089]: (42,)

How to calculate sparsity percentage for numpy array / matrix?

Code for the general case without NaNs:

More articles: