Pandas backward compatibility issue with 0.14.1 and 0.15.2 brine

We use pandas Dataframe as our main data container for our time series data. We collect the dataframe in binary drops in the mongoDB file for storage along with the keys for metadata about the time series block.

We encountered an error updating from pandas from 0.14.1 to 0.15.2.

Create a pandas Dataframe binary block (0.14.1)

import lz4 import cPickle bd = lz4.compress(cPickle.dumps(df,cPickle.HIGHEST_PROTOCOL)) 

Case of errors . Read back from mongoDB with pandas 0.15.2

 cPickle.loads(lz4.decompress(bd)) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-37-76f7b0b41426> in <module>() ----> 1 cPickle.loads(lz4.decompress(bd)) TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function _reconstruct>, (<class 'pandas.core.index.Index'>, (0,), 'b')) 

A case of success . Read with mongoDB again with pandas 0.14.1 without errors.

This is similar to the old Pandas stack thread compiled from the source: the default brine behavior change is changed With useful comment https://stackoverflow.com/users/644898/jeff

The error message that you see `TypeError: _reconstruct: The first argument must be a subtype of ndarray, is that the python default unpickler ensures that the class hierarchy that was pickled is the same as it recreates. Since the series has changed between versions this is no longer possible with default defaults (this IMHO is an error in how the brine works). In any case, pandas will be unpacked to 0.13 pickles that have objects in the series. "

Any ideas on workarounds or solutions?

To repair the error:

Setup in pandas 0.14.1 env:

 df = pd.DataFrame(np.random.randn(10,10)) cPickle.dump(df,open("cp0141.p","wb")) cPickle.load(open('cp0141.p','r')) # no error 

Create error in pandas 0.15.2 env:

 cPickle.load(open('cp0141.p','r')) TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function_reconstruct>, (<class 'pandas.core.index.Int64Index'>, (0,), 'b')) 
+6
source share
1 answer

It was an explication, named as the Index class, now no later subclasses of ndarray , but a pandas object, see here .

You just need to use pd.read_pickle to read the pickles.

+6
source

Source: https://habr.com/ru/post/981014/


All Articles