We use pandas Dataframe as our main data container for our time series data. We collect the dataframe in binary drops in the mongoDB file for storage along with the keys for metadata about the time series block.
We encountered an error updating from pandas from 0.14.1 to 0.15.2.
Create a pandas Dataframe binary block (0.14.1)
import lz4 import cPickle bd = lz4.compress(cPickle.dumps(df,cPickle.HIGHEST_PROTOCOL))
Case of errors . Read back from mongoDB with pandas 0.15.2
cPickle.loads(lz4.decompress(bd)) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-37-76f7b0b41426> in <module>() ----> 1 cPickle.loads(lz4.decompress(bd)) TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function _reconstruct>, (<class 'pandas.core.index.Index'>, (0,), 'b'))
A case of success . Read with mongoDB again with pandas 0.14.1 without errors.
This is similar to the old Pandas stack thread compiled from the source: the default brine behavior change is changed With useful comment https://stackoverflow.com/users/644898/jeff
The error message that you see `TypeError: _reconstruct: The first argument must be a subtype of ndarray, is that the python default unpickler ensures that the class hierarchy that was pickled is the same as it recreates. Since the series has changed between versions this is no longer possible with default defaults (this IMHO is an error in how the brine works). In any case, pandas will be unpacked to 0.13 pickles that have objects in the series. "
Any ideas on workarounds or solutions?
To repair the error:
Setup in pandas 0.14.1 env:
df = pd.DataFrame(np.random.randn(10,10)) cPickle.dump(df,open("cp0141.p","wb")) cPickle.load(open('cp0141.p','r')) # no error
Create error in pandas 0.15.2 env:
cPickle.load(open('cp0141.p','r')) TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function_reconstruct>, (<class 'pandas.core.index.Int64Index'>, (0,), 'b'))