I am scipy.sparse.csr_matrix
sparse vector using scipy.sparse.csr_matrix
, for example:
csr_matrix((values, (np.zeros(len(indices)), indices)), shape = (1, max_index))
This works fine for most of my data, but sometimes I get ValueError: could not convert integer scalar
.
This reproduces the problem:
In [145]: inds Out[145]: array([ 827969148, 996833913, 1968345558, 898183169, 1811744124, 2101454109, 133039182, 898183170, 919293479, 133039089]) In [146]: vals Out[146]: array([ 1., 1., 1., 1., 1., 2., 1., 1., 1., 1.]) In [147]: max_index Out[147]: 2337713000 In [143]: csr_matrix((vals, (np.zeros(10), inds)), shape = (1, max_index+1)) ... 996 fn = _sparsetools.csr_sum_duplicates 997 M,N = self._swap(self.shape) --> 998 fn(M, N, self.indptr, self.indices, self.data) 999 1000 self.prune()
inds
is the np.int64
array, and vals
is the np.float64
array.
The relevant piece of scipy sum_duplicates
code is here .
Please note that this works:
In [235]: csr_matrix(([1,1], ([0,0], [1,2])), shape = (1, 2**34)) Out[235]: <1x17179869184 sparse matrix of type '<type 'numpy.int64'>' with 2 stored elements in Compressed Sparse Row format>
So the problem is not that one of the dimensions - > 2^31
Any thoughts why these values ββshould cause problems?