I am trying to serialize a large python object consisting of a tuple of numpy arrays using pickle / cPickle and gzip. This procedure works up to a certain data size, and after that I get the following error:
--> 121 cPickle.dump(dataset_pickle, f) ***/gzip.pyc in write(self, data) 238 print(type(self.crc)) 239 print(self.crc) --> 240 self.crc = zlib.crc32(data, self.crc) & 0xffffffffL 241 self.fileobj.write( self.compress.compress(data) ) OverflowError: size does not fit in an int
The numpy array is about 1.5 GB in size, and the string sent to zlib.crc32 is over 2 GB. I am working on a 64 bit machine and my Python is also 64 bit
>>> import sys >>> sys.maxsize 9223372036854775807
Is this a bug with python or am I doing something wrong? Are there any good alternatives for compressing and serializing numpy arrays? I'm watching numpy.savez , PyTables, and HDF5 right now, but it would be nice to know why I am having these problems, since I have enough memory
Update: I remember reading somewhere that this could be caused by using the old version of Numpy (and I was), but I completely switched to numpy.save/savez, and actually faster than cPickle (at least in my case)
source share