The fastest way to sum over sparse matrix rows

I have a large csr_matrix (1M * 1K) and I want to add rows and get a new csr_matrix with the same number of columns but with a reduced number of rows. In fact, my problem is exactly the same as in Sum by line in scipy.sparse.csr_matrix . The only thing I find is that the decision made is slow for my purpose. Let me say that I have

map_fn = np.random.randint(0, 10000, 1000000)

map_fnIt tells how my input lines (1M) are displayed in my output lines (10K). For example, the ith input line is added to the output line map_fn[i]. I tried the two approaches mentioned in the above question, namely, the formation of a sparse matrix and the use of a sparse sum. Although the sparse matrix approach looks better than the low sum approach, but I find it slow for my purpose. Here is the code comparing the two approaches:

import scipy.sparse
import numpy as np 
import time

print "Setting up input"
s=10000
n=1000000
d=1000
density=1.0/500

X=scipy.sparse.rand(n,d,density=density,format="csr")
map_fn=np.random.randint(0, s, n)

# Approach 1
start_time=time.time()
col = scipy.arange(n)
val = np.ones(n)
S = scipy.sparse.csr_matrix( (val, (map_fn, col)), shape = (s,n))
print "Approach 1 Creation time : ",time.time()-start_time
SX = S.dot(X)
print "Approach 1 Total time : ",time.time()-start_time

#Approach 2
start_time=time.time()
SX = np.zeros((s,X.shape[1]))
for i in range(SX.shape[0]):
    SX[i,:] = X[np.where(map_fn==i)[0],:].sum(axis=0)

print "Approach 2 Total time : ",time.time()-start_time

which gives the following numbers:

Approach 1 Creation time :  0.187678098679
Approach 1 Total time :  0.286989927292
Approach 2 Total time :  10.208632946

So my question is, is there a better way to do this? I find the formation of a sparse matrix to be redundant, since it takes more than half the time. Are there any better alternatives? Any suggestions are welcome. thank

+4
source share
1 answer

sparse solution from this post -

def sparse_matrix_mult_sparseX_mod1(X, rows):   
    nrows = rows.max()+1
    ncols = X.shape[1]
    nelem = nrows * ncols

    a,b = X.nonzero()
    ids = rows[a] + b*nrows
    sums = np.bincount(ids, X[a,b].A1, minlength=nelem)
    out = sums.reshape(ncols,-1).T
    return out

№ 1 -

def app1(X, map_fn):
    col = scipy.arange(n)
    val = np.ones(n)
    S = scipy.sparse.csr_matrix( (val, (map_fn, col)), shape = (s,n))
    SX = S.dot(X)
    return SX

-

In [209]: # Inputs setup
     ...: s=10000
     ...: n=1000000
     ...: d=1000
     ...: density=1.0/500
     ...: 
     ...: X=scipy.sparse.rand(n,d,density=density,format="csr")
     ...: map_fn=np.random.randint(0, s, n)
     ...: 

In [210]: out1 = app1(X, map_fn)
     ...: out2 = sparse_matrix_mult_sparseX_mod1(X, map_fn)
     ...: print np.allclose(out1.toarray(), out2)
     ...: 
True

In [211]: %timeit app1(X, map_fn)
1 loop, best of 3: 517 ms per loop

In [212]: %timeit sparse_matrix_mult_sparseX_mod1(X, map_fn)
10 loops, best of 3: 147 ms per loop

, app1 -

In [214]: %timeit app1(X, map_fn).toarray()
1 loop, best of 3: 584 ms per loop

Numba

numba, . -

from numba import njit

@njit
def bincount_mod2(out, rows, r, C, V):
    N = len(V)
    for i in range(N):
        out[rows[r[i]], C[i]] += V[i]
    return out

def sparse_matrix_mult_sparseX_mod2(X, rows):
    nrows = rows.max()+1
    ncols = X.shape[1]
    r,C = X.nonzero()

    V = X[r,C].A1
    out = np.zeros((nrows, ncols))
    return bincount_mod2(out, rows, r, C, V)

-

In [373]: # Inputs setup
     ...: s=10000
     ...: n=1000000
     ...: d=1000
     ...: density=1.0/100 # Denser now!
     ...: 
     ...: X=scipy.sparse.rand(n,d,density=density,format="csr")
     ...: map_fn=np.random.randint(0, s, n)
     ...: 

In [374]: %timeit app1(X, map_fn)
1 loop, best of 3: 787 ms per loop

In [375]: %timeit sparse_matrix_mult_sparseX_mod1(X, map_fn)
1 loop, best of 3: 906 ms per loop

In [376]: %timeit sparse_matrix_mult_sparseX_mod2(X, map_fn)
1 loop, best of 3: 705 ms per loop

app1 -

In [379]: %timeit app1(X, map_fn).toarray()
1 loop, best of 3: 910 ms per loop
+3

Source: https://habr.com/ru/post/1686570/


All Articles