A subset of matrix multiplication, fast and sparse

Converting a co-filtering code to use sparse matrices I am puzzled by the following problem: given the two full matrices X (m by l) and Theta (n by l) and the sparse matrix R (m by n), there is a quick way to calculate the sparse internal product. Large sizes are m and n (order 100000), and l is small (order 10). This is probably a fairly common operation for big data, as it appears in the cost function of most problems with linear regression, so I would expect a solution built into scipy.sparse, but I have not found anything obvious so far.

The naive way to do this in python is R.multiply (XTheta.T), but this will result in evaluating the full XTheta.T matrix (m by n, order 100000 ** 2), which takes up too much memory, then flushing most of the entries since R is sparse.

There is a pseudo-solution of https://stackoverflow.com/a/146505/ ... but it is not resolved in one step:

def sparse_mult_notreally(a, b, coords): rows, cols = coords rows, r_idx = np.unique(rows, return_inverse=True) cols, c_idx = np.unique(cols, return_inverse=True) C = np.array(np.dot(a[rows, :], b[:, cols])) # this operation is dense return sp.coo_matrix( (C[r_idx,c_idx],coords), (a.shape[0],b.shape[1]) ) 

This works fine, and fast, for me on fairly small arrays, but it overlays my large datasets with the following error:

 ... in sparse_mult(a, b, coords) 132 rows, r_idx = np.unique(rows, return_inverse=True) 133 cols, c_idx = np.unique(cols, return_inverse=True) --> 134 C = np.array(np.dot(a[rows, :], b[:, cols])) # this operation is not sparse 135 return sp.coo_matrix( (C[r_idx,c_idx],coords), (a.shape[0],b.shape[1]) ) ValueError: array is too big. 

A solution that is actually sparse but very slow:

 def sparse_mult(a, b, coords): rows, cols = coords n = len(rows) C = np.array([ float(a[rows[i],:]*b[:,cols[i]]) for i in range(n) ]) # this is sparse, but VERY slow return sp.coo_matrix( (C,coords), (a.shape[0],b.shape[1]) ) 

Does anyone know a quick, completely sparse way to do this?

+4
source share
3 answers

I have profiled 4 different solutions to your problem, and it looks like any array size, the numba jit solution is the Best. A close second is @Alexander cython's solution.

Here are the results (M is the number of rows in the x array):

 M = 1000 function sparse_dense took 0.03 sec. function sparse_loop took 0.07 sec. function sparse_numba took 0.00 sec. function sparse_cython took 0.09 sec. M = 10000 function sparse_dense took 2.88 sec. function sparse_loop took 0.68 sec. function sparse_numba took 0.00 sec. function sparse_cython took 0.01 sec. M = 100000 function sparse_dense ran out of memory function sparse_loop took 6.84 sec. function sparse_numba took 0.09 sec. function sparse_cython took 0.12 sec. 

The script I used to profile these methods:

 import numpy as np from scipy.sparse import coo_matrix from numba import autojit, jit, float64, int32 import pyximport pyximport.install(setup_args={"script_args":["--compiler=mingw32"], "include_dirs":np.get_include()}, reload_support=True) def sparse_dense(a,b,c): return coo_matrix(c.multiply(np.dot(a,b))) def sparse_loop(a,b,c): """Multiply sparse matrix `c` by np.dot(a,b) by looping over non-zero entries in `c` and using `np.dot()` for each entry.""" N = c.size data = np.empty(N,dtype=float) for i in range(N): data[i] = c.data[i]*np.dot(a[c.row[i],:],b[:,c.col[i]]) return coo_matrix((data,(c.row,c.col)),shape=(a.shape[0],b.shape[1])) #@autojit def _sparse_mult4(a,b,cd,cr,cc): N = cd.size data = np.empty_like(cd) for i in range(N): num = 0.0 for j in range(a.shape[1]): num += a[cr[i],j]*b[j,cc[i]] data[i] = cd[i]*num return data _fast_sparse_mult4 = \ jit(float64[:,:](float64[:,:],float64[:,:],float64[:],int32[:],int32[:]))(_sparse_mult4) def sparse_numba(a,b,c): """Multiply sparse matrix `c` by np.dot(a,b) using Numba jit.""" assert c.shape == (a.shape[0],b.shape[1]) data = _fast_sparse_mult4(a,b,c.data,c.row,c.col) return coo_matrix((data,(c.row,c.col)),shape=(a.shape[0],b.shape[1])) def sparse_cython(a, b, c): """Computes c.multiply(np.dot(a,b)) using cython.""" from sparse_mult_c import sparse_mult_c data = np.empty_like(c.data) sparse_mult_c(a,b,c.data,c.row,c.col,data) return coo_matrix((data,(c.row,c.col)),shape=(a.shape[0],b.shape[1])) def unique_rows(a): a = np.ascontiguousarray(a) unique_a = np.unique(a.view([('', a.dtype)]*a.shape[1])) return unique_a.view(a.dtype).reshape((unique_a.shape[0], a.shape[1])) if __name__ == '__main__': import time for M in [1000,10000,100000]: print 'M = %i' % M N = M + 2 L = 10 x = np.random.rand(M,L) t = np.random.rand(N,L).T # number of non-zero entries in sparse r matrix S = M*10 row = np.random.randint(M,size=S) col = np.random.randint(N,size=S) # remove duplicate rows and columns row, col = unique_rows(np.dstack((row,col)).squeeze()).T data = np.random.rand(row.size) r = coo_matrix((data,(row,col)),shape=(M,N)) a2 = sparse_loop(x,t,r) for f in [sparse_dense,sparse_loop,sparse_numba,sparse_cython]: t0 = time.time() try: a = f(x,t,r) except MemoryError: print 'function %s ran out of memory' % f.__name__ continue elapsed = time.time()-t0 try: diff = abs(a-a2) if diff.nnz > 0: assert np.max(abs(a-a2).data) < 1e-5 except AssertionError: print f.__name__ raise print 'function %s took %.2f sec.' % (f.__name__,elapsed) 

The cython function is a slightly modified version of @Alexander code:

 # working from tutorial at: http://docs.cython.org/src/tutorial/numpy.html cimport numpy as np # Turn bounds checking back on if there are ANY problems! cimport cython @cython.boundscheck(False) # turn of bounds-checking for entire function def sparse_mult_c(np.ndarray[np.float64_t, ndim=2] a, np.ndarray[np.float64_t, ndim=2] b, np.ndarray[np.float64_t, ndim=1] data, np.ndarray[np.int32_t, ndim=1] rows, np.ndarray[np.int32_t, ndim=1] cols, np.ndarray[np.float64_t, ndim=1] out): cdef int n = rows.shape[0] cdef int k = a.shape[1] cdef int i,j cdef double num for i in range(n): num = 0.0 for j in range(k): num += a[rows[i],j] * b[j,cols[i]] out[i] = data[i]*num 
+3
source

Based on more information about the comments, I think dropping you is a challenge to np.unique . Try the following approach:

 import numpy as np import scipy.sparse as sps from numpy.core.umath_tests import inner1d n = 100000 x = np.random.rand(n, 10) theta = np.random.rand(n, 10) rows = np.arange(n) cols = np.arange(n) np.random.shuffle(rows) np.random.shuffle(cols) def sparse_multiply(x, theta, rows, cols): data = inner1d(x[rows], theta[cols]) return sps.coo_matrix((data, (rows, cols)), shape=(x.shape[0], theta.shape[0])) 

I get the following timings:

 n = 1000 %timeit sparse_multiply(x, theta, rows, cols) 1000 loops, best of 3: 465 us per loop n = 10000 %timeit sparse_multiply(x, theta, rows, cols) 100 loops, best of 3: 4.29 ms per loop n = 100000 %timeit sparse_multiply(x, theta, rows, cols) 10 loops, best of 3: 61.5 ms per loop 

And of course with n = 100 :

 >>> np.allclose(sparse_multiply(x, theta, rows, cols).toarray()[rows, cols], x.dot(theta.T)[rows, cols]) >>> True 
+1
source

I have not tested Jaime yet (thanks again!), But I have implemented another answer that works in the meantime using cython.

sparse_mult_c.pyx file:

 # working from tutorial at: http://docs.cython.org/src/tutorial/numpy.html cimport numpy as np # Turn bounds checking back on if there are ANY problems! cimport cython @cython.boundscheck(False) # turn of bounds-checking for entire function def sparse_mult_c(np.ndarray[np.float64_t, ndim=2] a, np.ndarray[np.float64_t, ndim=2] b, np.ndarray[np.int32_t, ndim=1] rows, np.ndarray[np.int32_t, ndim=1] cols, np.ndarray[np.float64_t, ndim=1] C ): cdef int n = rows.shape[0] cdef int k = a.shape[1] cdef int i,j for i in range(n): for j in range(k): C[i] += a[rows[i],j] * b[j,cols[i]] 

Then compile it as http://docs.cython.org/src/userguide/tutorial.html

Then in my python code I include the following:

 def sparse_mult(a, b, coords): #a,b are np.ndarrays from sparse_mult_c import sparse_mult_c rows, cols = coords C = np.zeros(rows.shape[0]) sparse_mult_c(a,b,rows,cols,C) return sp.coo_matrix( (C,coords), (a.shape[0],b.shape[1]) ) 

This works completely sparse and works faster than even the original (memory-inefficient) solution.

0
source

Source: https://habr.com/ru/post/1502097/


All Articles