Sparse Matrix Separation

I have a scipy.sparse matrix with elements 45671x45671. In this matrix, some rows contain only the value "0".

My question is how to divide the values โ€‹โ€‹of each row by the sum of the row. Obviously, it works for the loop, but I'm looking for an efficient method ...

I already tried:

  • matrix / matrix.sum(1) , but I have a MemoryError problem.
  • matrix / scs.csc_matrix((matrix.sum(axis=1))) , but ValueError: inconsistent shapes
  • Other stupid things ...

In addition, I want to skip lines with only "0" values.

So, if you have a solution ...

Thank you in advance!

0
source share
1 answer

I have M hanging around:

 In [241]: M Out[241]: <6x3 sparse matrix of type '<class 'numpy.uint8'>' with 6 stored elements in Compressed Sparse Row format> In [242]: MA Out[242]: array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 1, 0], [0, 0, 1], [1, 0, 0]], dtype=uint8) In [243]: M.sum(1) # dense matrix Out[243]: matrix([[1], [1], [1], [1], [1], [1]], dtype=uint32) In [244]: M/M.sum(1) # dense matrix - full size of M Out[244]: matrix([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.], [ 0., 1., 0.], [ 0., 0., 1.], [ 1., 0., 0.]]) 

This will explain the memory error - if M so large that MA creates a memory error.


 In [262]: S = sparse.csr_matrix(M.sum(1)) In [263]: S.shape Out[263]: (6, 1) In [264]: M.shape Out[264]: (6, 3) In [265]: M/S .... ValueError: inconsistent shapes 

I'm not quite sure what is going on here.

Multiplication by element type

 In [266]: M.multiply(S) Out[266]: <6x3 sparse matrix of type '<class 'numpy.uint32'>' with 6 stored elements in Compressed Sparse Row format> 

Therefore, it should work if I build S as S = sparse.csr_matrix(1/M.sum(1))

If some of the lines add up to zero, you have a division by zero problem.


If I change M to have 0 row

 In [283]: MA Out[283]: array([[1, 0, 0], [0, 1, 0], [0, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0]], dtype=uint8) In [284]: S = sparse.csr_matrix(1/M.sum(1)) /usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in true_divide #!/usr/bin/python3 In [285]: SA Out[285]: array([[ 1.], [ 1.], [ inf], [ 1.], [ 1.], [ 1.]]) In [286]: M.multiply(S) Out[286]: <6x3 sparse matrix of type '<class 'numpy.float64'>' with 5 stored elements in Compressed Sparse Row format> In [287]: _.A Out[287]: array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.], [ 1., 0., 0.]]) 

This is not the best M to demonstrate this, but it offers a useful approach. The sum of the row will be dense, so you can clear it back using the usual dense massive approaches.

+1
source

Source: https://habr.com/ru/post/1270324/


All Articles