A non-NDFFrame object error using the pandas.SparseSeries.from_coo () function

Question

A non-NDFFrame object error using the pandas.SparseSeries.from_coo () function

I am trying to convert a sparse matrix like COO (from Scipy.Sparse) into a sparse Pandas series. The documentation ( http://pandas.pydata.org/pandas-docs/stable/sparse.html ) says that it uses the SparseSeries.from_coo(A) command. Everything seems to be in order, but when I try to see the attributes of the series, this is what happens.

10x10 seems OK.

 import pandas as pd import scipy.sparse as ss import numpy as np row = (np.random.random(10)*10).astype(int) col = (np.random.random(10)*10).astype(int) val = np.random.random(10)*10 sparse = ss.coo_matrix((val,(row,col)),shape=(10,10)) pss = pd.SparseSeries.from_coo(sparse) print pss 0 7 1.416631 9 5.833902 1 0 4.131919 2 3 2.820531 7 2.227009 3 1 9.205619 4 4 8.309077 6 0 4.376921 7 6 8.444013 7 7.383886 dtype: float64 BlockIndex Block locations: array([0]) Block lengths: array([10])

But not 100x100.

 import pandas as pd import scipy.sparse as ss import numpy as np row = (np.random.random(100)*100).astype(int) col = (np.random.random(100)*100).astype(int) val = np.random.random(100)*100 sparse = ss.coo_matrix((val,(row,col)),shape=(100,100)) pss = pd.SparseSeries.from_coo(sparse) print pss --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-790-f0c22a601b93> in <module>() 7 sparse = ss.coo_matrix((val,(row,col)),shape=(100,100)) 8 pss = pd.SparseSeries.from_coo(sparse) ----> 9 print pss 10 C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\base.pyc in __str__(self) 45 if compat.PY3: 46 return self.__unicode__() ---> 47 return self.__bytes__() 48 49 def __bytes__(self): C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\base.pyc in __bytes__(self) 57 58 encoding = get_option("display.encoding") ---> 59 return self.__unicode__().encode(encoding, 'replace') 60 61 def __repr__(self): C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\sparse\series.pyc in __unicode__(self) 287 def __unicode__(self): 288 # currently, unicode is same as repr...fixes infinite loop --> 289 series_rep = Series.__unicode__(self) 290 rep = '%s\n%s' % (series_rep, repr(self.sp_index)) 291 return rep C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\series.pyc in __unicode__(self) 895 896 self.to_string(buf=buf, name=self.name, dtype=self.dtype, --> 897 max_rows=max_rows) 898 result = buf.getvalue() 899 C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\series.pyc in to_string(self, buf, na_rep, float_format, header, length, dtype, name, max_rows) 960 the_repr = self._get_repr(float_format=float_format, na_rep=na_rep, 961 header=header, length=length, dtype=dtype, --> 962 name=name, max_rows=max_rows) 963 964 # catch contract violations C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\series.pyc in _get_repr(self, name, header, length, dtype, na_rep, float_format, max_rows) 989 na_rep=na_rep, 990 float_format=float_format, --> 991 max_rows=max_rows) 992 result = formatter.to_string() 993 C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\format.pyc in __init__(self, series, buf, length, header, na_rep, name, float_format, dtype, max_rows) 145 self.dtype = dtype 146 --> 147 self._chk_truncate() 148 149 def _chk_truncate(self): C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\format.pyc in _chk_truncate(self) 158 else: 159 row_num = max_rows // 2 --> 160 series = concat((series.iloc[:row_num], series.iloc[-row_num:])) 161 self.tr_row_num = row_num 162 self.tr_series = series C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\tools\merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy) 752 keys=keys, levels=levels, names=names, 753 verify_integrity=verify_integrity, --> 754 copy=copy) 755 return op.get_result() 756 C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\tools\merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy) 803 for obj in objs: 804 if not isinstance(obj, NDFrame): --> 805 raise TypeError("cannot concatenate a non-NDFrame object") 806 807 # consolidate TypeError: cannot concatenate a non-NDFrame object

I really don't understand the error message - I think I follow the example in the documentation for the letter, just using my own COO matrix (could it be the size?)

Hello

+3

python numpy scipy pandas sparse-matrix

Francesco Aug 12 '15 at 15:52

source share

1 answer

hpaulj · Answer 1 · 2015-12-10T01:19:02+0000

I have an older pandas . It has sparse code, but not tocoo . The pandas issue that was filed in connection with this: https://github.com/pydata/pandas/issues/10818

But I found on github , that:

 def _coo_to_sparse_series(A, dense_index=False): """ Convert a scipy.sparse.coo_matrix to a SparseSeries. Use the defaults given in the SparseSeries constructor. """ s = Series(A.data, MultiIndex.from_arrays((A.row, A.col))) s = s.sort_index() s = s.to_sparse() # TODO: specify kind? # ... return s

With a small sparse matrix, I create and display without problems:

 In [259]: Asml=sparse.coo_matrix(np.arange(10*5).reshape(10,5)) In [260]: s=pd.Series(Asml.data,pd.MultiIndex.from_arrays((Asml.row,Asml.col))) In [261]: s=s.sort_index() In [262]: s Out[262]: 0 1 1 2 2 3 3 4 4 1 0 5 1 6 2 7 [... mine] 3 48 4 49 dtype: int32 In [263]: ssml=s.to_sparse() In [264]: ssml Out[264]: 0 1 1 2 2 3 3 4 4 1 0 5 [... mine] 2 47 3 48 4 49 dtype: int32 BlockIndex Block locations: array([0]) Block lengths: array([49])

but with a large array (more non-zero elements) a display error appears. I guess this happens when the display for the (simple) series starts using the ellipsis (...). I work in Py3, so I get another error message.

 ....\pandas\core\base.pyc in __str__(self) 45 if compat.PY3: 46 return self.__unicode__() # py3 47 return self.__bytes__() # py2 route

eg:

 In [265]: Asml=sparse.coo_matrix(np.arange(10*7).reshape(10,7)) In [266]: s=pd.Series(Asml.data,pd.MultiIndex.from_arrays((Asml.row,Asml.col))) In [267]: s=s.sort_index() In [268]: s Out[268]: 0 1 1 2 2 3 3 4 4 5 5 6 6 1 0 7 1 8 2 9 3 10 4 11 5 12 6 13 2 0 14 1 15 ... 7 6 55 8 0 56 1 57 [... mine] Length: 69, dtype: int32 In [269]: ssml=s.to_sparse() In [270]: ssml Out[270]: <repr(<pandas.sparse.series.SparseSeries at 0xaff6bc0c>) failed: AttributeError: 'SparseArray' object has no attribute '_get_repr'>

I am not familiar with pandas code and structures to output much more at the moment.

A non-NDFFrame object error using the pandas.SparseSeries.from_coo () function

More articles: