How to read serialized data using python2 cPikle with python3 pickle?

I am trying to work with a CIFAR-10 dataset which contains a special version for python .

This is a set of binary files, each of which is a dictionary of 10k numpy matrices. The files were obviously created by python2 cPickle .

I tried downloading it from python2 as follows:

 import cPickle with open("data/data_batch_1", "rb") as f: data = cPickle.load(f) 

This works great. However, if I try to load data from python3 (instead of cPickle but pickle ), this will not work:

 import pickle with open("data/data_batch_1", "rb") as f: data = pickle.load(f) 

If the failure occurs with the following error:

 UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128) 

Is there any way to convert the original dataset into a new one that will be available for reading from python3? Or can I somehow read it from python3 direrctly?

I tried loading it with cPickle , dropping it to json and returning it pickle back, but the numpy matrices obviously cannot be written as a json file.

+5
source share
1 answer

You will need to tell which codec to use for these bytes, or to say that instead it loads data as bytes . From pickle.load() documentation :

Coding and errors report how to decode 8-bit strings pickled by Python 2; they default to "ASCII and", respectively. The encoding can be β€œbytes” for reading these 8-bit string instances as byte objects.

To load strings as bytes objects that will be:

 import pickle with open("data/data_batch_1", "rb") as f: data = pickle.load(f, encoding='bytes') 
+5
source

Source: https://habr.com/ru/post/1236553/


All Articles