The average value of nonzero values in a sparse matrix?

Question

The average value of nonzero values in a sparse matrix?

I am trying to calculate the average of nonzero values in each row of a sparse row matrix. Using the middle matrix method does not:

>>> from scipy.sparse import csr_matrix >>> a = csr_matrix([[0, 0, 2], [1, 3, 8]]) >>> a.mean(axis=1) matrix([[ 0.66666667], [ 4. ]])

The following works, but is slow for large matrices:

 >>> import numpy as np >>> b = np.zeros(a.shape[0]) >>> for i in range(a.shape[0]): ... b[i] = a.getrow(i).data.mean() ... >>> b array([ 2., 4.])

Can someone tell me if there is a faster method?

+5

python scipy sparse-matrix

batsc Dec 14 '15 at 11:21

source share

3 answers

With a CSR format matrix, you can make it even easier:

 sums = a.sum(axis=1).A1 counts = np.diff(a.indptr) averages = sums / counts

String strings are directly supported, and the CSR format structure means that the difference between consecutive values in the indptr array corresponds exactly to the number of nonzero elements in each row.

+5

perimosocordiae Dec 15 '15 at 17:16

source share

I always like to sum values on any axis you are interested in and divide by the total number of nonzero elements in the corresponding row / column.

Same:

 sp_arr = csr_matrix([[0, 0, 2], [1, 3, 8]]) col_avg = sp_arr.sum(0) / (sp_arr != 0).sum(0) row_avg = sp_arr.sum(1) / (sp_arr != 0).sum(1) print(col_avg) matrix([[ 1., 3., 5.]]) print(row_avg) matrix([[ 2.], [ 4.]])

Basically, you summarize the total value of all records along this axis and divide by the sum of True records, where the matrix! = 0 (which is the number of real records).

I find this approach less complex and simpler than other options.

+1

Grr 25 sept. '17 at 21:43

source share

Antonio Ragagnin · Accepted Answer · 2015-12-14T12:59:21+0000

This is a typical problem where you can use numpy.bincount. For this, I used three functions:

 (x,y,z)=scipy.sparse.find(a)

returns rows ( x ), columns ( y ), and values ( z ) of a sparse matrix. For instace x there is array([0, 1, 1, 1].

numpy.bincount(x) returns for each line number the number of unnecessary unnecessary elements.

numpy.bincount(x,wights=z) returns for each row the sum of nonzero elements.

Final working code:

 from scipy.sparse import csr_matrix a = csr_matrix([[0, 0, 2], [1, 3, 8]]) import numpy import scipy.sparse (x,y,z)=scipy.sparse.find(a) countings=numpy.bincount(x) sums=numpy.bincount(x,weights=z) averages=sums/countings print(averages)

returns:

 [ 2. 4.]

The average value of nonzero values ​​in a sparse matrix?

More articles:

The average value of nonzero values in a sparse matrix?