Sparse matrix slicing using int list

I am writing a machine learning algorithm based on huge and sparse data (my matrix has the form (347, 5 416 812 801), but it is very sparse, only 0.13% of the data is non-zero.

My sparse matrix size is 105,000 bytes (<1 MB) and is of type csr.

I am trying to separate train / test sets by selecting a list of example indices for each. So I want to split my data set into two using:

training_set = matrix[train_indices]

form (len(training_indices), 5 416 812 801)still sparse

testing_set = matrix[test_indices]

molds are (347-len(training_indices), 5 416 812 801)also sparse

C training_indicesand testing_indicestwo listofint

But training_set = matrix[train_indices]it seems unsuccessful and returnsSegmentation fault (core dumped)

Perhaps this is not a memory problem, since I am running this code on a server with 64 GB of RAM.

- , ?

+2
1

, csr :

def extractor(indices, N):
   indptr=np.arange(len(indices)+1)
   data=np.ones(len(indices))
   shape=(len(indices),N)
   return sparse.csr_matrix((data,indices,indptr), shape=shape)

csr :

In [185]: M
Out[185]: 
<30x40 sparse matrix of type '<class 'numpy.float64'>'
    with 76 stored elements in Compressed Sparse Row format>

In [186]: indices=np.r_[0:20]

In [187]: M[indices,:]
Out[187]: 
<20x40 sparse matrix of type '<class 'numpy.float64'>'
    with 57 stored elements in Compressed Sparse Row format>

In [188]: extractor(indices, M.shape[0])*M
Out[188]: 
<20x40 sparse matrix of type '<class 'numpy.float64'>'
    with 57 stored elements in Compressed Sparse Row format>

csr, . 1 . .

In [189]: timeit M[indices,:]
1000 loops, best of 3: 515 µs per loop
In [190]: timeit extractor(indices, M.shape[0])*M
1000 loops, best of 3: 399 µs per loop

(len (training_indices), 347) , len(training_indices). .

matrix (, , ), , python/numpy.

matrix.sum(axis=1). , 1s. sparse.eye(347)*M, ?

+2

Source: https://habr.com/ru/post/1686573/


All Articles