Fill Pandas SparseDataFrame from SciPy Sparse Coo Matrix

(This question relates to “populating a Pandas SparseDataFrame from a SciPy sparse matrix . I want to populate a SparseDataFrame from scipy.sparse. Coo _matrix (specifically) The above question is for another SciPy Sparse Matrix ( csr ) ... So here it is ... )

I noticed that Pandas now has support for sparse matrices and arrays . I am currently creating a DataFrame() like this:

 return DataFrame(matrix.toarray(), columns=features, index=observations) 

Is there a way to create SparseDataFrame() with scipy.sparse.coo_matrix() or coo_matrix() ? Converting to tight format kills RAM badly!

+3
source share
1 answer

http://pandas.pydata.org/pandas-docs/stable/sparse.html#interaction-with-scipy-sparse

The convenience method SparseSeries.from_coo () is implemented to create SparseSeries from scipy.sparse.coo_matrix.

Inside scipy.sparse there are methods that transform data forms into each other. .tocoo , .tocsc , etc. Thus, you can use any form that is best suited for a particular operation.

For another way, I answered

Pandas sparse data format for sparse matrix without creating a dense matrix in memory

A related answer from 2013 is repeated line by line - using toarray to make the line dense. I did not look what pandas from_coo .

Later SO question on pandas sparse

non-NDFFrame object error using pandas.SparseSeries.from_coo () function


From https://github.com/pydata/pandas/blob/master/pandas/sparse/scipy_sparse.py

 def _coo_to_sparse_series(A, dense_index=False): """ Convert a scipy.sparse.coo_matrix to a SparseSeries. Use the defaults given in the SparseSeries constructor. """ s = Series(A.data, MultiIndex.from_arrays((A.row, A.col))) s = s.sort_index() s = s.to_sparse() # TODO: specify kind? # ... return s 

In fact, it takes the same data , i , j that is used to construct the coo matrix, creates a series, sorts it, and turns it into a sparse series.

+2
source

Source: https://habr.com/ru/post/1243689/


All Articles