Cryptic scipy "failed to convert integer scalar" error

I am scipy.sparse.csr_matrix sparse vector using scipy.sparse.csr_matrix , for example:

 csr_matrix((values, (np.zeros(len(indices)), indices)), shape = (1, max_index)) 

This works fine for most of my data, but sometimes I get ValueError: could not convert integer scalar .

This reproduces the problem:

 In [145]: inds Out[145]: array([ 827969148, 996833913, 1968345558, 898183169, 1811744124, 2101454109, 133039182, 898183170, 919293479, 133039089]) In [146]: vals Out[146]: array([ 1., 1., 1., 1., 1., 2., 1., 1., 1., 1.]) In [147]: max_index Out[147]: 2337713000 In [143]: csr_matrix((vals, (np.zeros(10), inds)), shape = (1, max_index+1)) ... 996 fn = _sparsetools.csr_sum_duplicates 997 M,N = self._swap(self.shape) --> 998 fn(M, N, self.indptr, self.indices, self.data) 999 1000 self.prune() # nnz may have changed ValueError: could not convert integer scalar 

inds is the np.int64 array, and vals is the np.float64 array.

The relevant piece of scipy sum_duplicates code is here .

Please note that this works:

 In [235]: csr_matrix(([1,1], ([0,0], [1,2])), shape = (1, 2**34)) Out[235]: <1x17179869184 sparse matrix of type '<type 'numpy.int64'>' with 2 stored elements in Compressed Sparse Row format> 

So the problem is not that one of the dimensions - > 2^31

Any thoughts why these values ​​should cause problems?

+5
source share
3 answers

Could it be that max_index> 2 ** 31? Try this, just make sure:

csr_matrix((vals, (np.zeros(10), inds/2)), shape = (1, max_index/2))

+1
source

The maximum index you specify is less than the maximum index of the rows you supply.

This sparse.csr_matrix((vals, (np.zeros(10), inds)), shape = (1, np.max(inds)+1)) works fine with me.

Although creating .todense () results in a memory error for a large matrix size

0
source

Uncommenting the sum_duplicates function will lead to other errors. But this is a fix: a strange error while creating csr_matrix also solves your problem. You can extend version_check to newer versions of scipy.

 import scipy import scipy.sparse if scipy.__version__ in ("0.14.0", "0.14.1", "0.15.1"): _get_index_dtype = scipy.sparse.sputils.get_index_dtype def _my_get_index_dtype(*a, **kw): kw.pop('check_contents', None) return _get_index_dtype(*a, **kw) scipy.sparse.compressed.get_index_dtype = _my_get_index_dtype scipy.sparse.csr.get_index_dtype = _my_get_index_dtype scipy.sparse.bsr.get_index_dtype = _my_get_index_dtype 
0
source

Source: https://habr.com/ru/post/1015391/


All Articles