Sparse Matrix Separation

Question

Sparse Matrix Separation

I have a scipy.sparse matrix with elements 45671x45671. In this matrix, some rows contain only the value "0".

My question is how to divide the values of each row by the sum of the row. Obviously, it works for the loop, but I'm looking for an efficient method ...

I already tried:

matrix / matrix.sum(1) , but I have a MemoryError problem.
matrix / scs.csc_matrix((matrix.sum(axis=1))) , but ValueError: inconsistent shapes
Other stupid things ...

In addition, I want to skip lines with only "0" values.

So, if you have a solution ...

Thank you in advance!

0

python scipy matrix sparse-matrix

Paulo May 19 '17 at 23:30

source share

1 answer

hpaulj · Accepted Answer · 2017-05-19T23:40:28+0000

I have M hanging around:

 In [241]: M Out[241]: <6x3 sparse matrix of type '<class 'numpy.uint8'>' with 6 stored elements in Compressed Sparse Row format> In [242]: MA Out[242]: array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 1, 0], [0, 0, 1], [1, 0, 0]], dtype=uint8) In [243]: M.sum(1) # dense matrix Out[243]: matrix([[1], [1], [1], [1], [1], [1]], dtype=uint32) In [244]: M/M.sum(1) # dense matrix - full size of M Out[244]: matrix([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.], [ 0., 1., 0.], [ 0., 0., 1.], [ 1., 0., 0.]])

This will explain the memory error - if M so large that MA creates a memory error.

 In [262]: S = sparse.csr_matrix(M.sum(1)) In [263]: S.shape Out[263]: (6, 1) In [264]: M.shape Out[264]: (6, 3) In [265]: M/S .... ValueError: inconsistent shapes

I'm not quite sure what is going on here.

Multiplication by element type

 In [266]: M.multiply(S) Out[266]: <6x3 sparse matrix of type '<class 'numpy.uint32'>' with 6 stored elements in Compressed Sparse Row format>

Therefore, it should work if I build S as S = sparse.csr_matrix(1/M.sum(1))

If some of the lines add up to zero, you have a division by zero problem.

If I change M to have 0 row

 In [283]: MA Out[283]: array([[1, 0, 0], [0, 1, 0], [0, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0]], dtype=uint8) In [284]: S = sparse.csr_matrix(1/M.sum(1)) /usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in true_divide #!/usr/bin/python3 In [285]: SA Out[285]: array([[ 1.], [ 1.], [ inf], [ 1.], [ 1.], [ 1.]]) In [286]: M.multiply(S) Out[286]: <6x3 sparse matrix of type '<class 'numpy.float64'>' with 5 stored elements in Compressed Sparse Row format> In [287]: _.A Out[287]: array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.], [ 1., 0., 0.]])

This is not the best M to demonstrate this, but it offers a useful approach. The sum of the row will be dense, so you can clear it back using the usual dense massive approaches.

Sparse Matrix Separation

More articles: