Python gzip: OverflowError size does not fit in int

I am trying to serialize a large python object consisting of a tuple of numpy arrays using pickle / cPickle and gzip. This procedure works up to a certain data size, and after that I get the following error:

--> 121 cPickle.dump(dataset_pickle, f) ***/gzip.pyc in write(self, data) 238 print(type(self.crc)) 239 print(self.crc) --> 240 self.crc = zlib.crc32(data, self.crc) & 0xffffffffL 241 self.fileobj.write( self.compress.compress(data) ) OverflowError: size does not fit in an int 

The numpy array is about 1.5 GB in size, and the string sent to zlib.crc32 is over 2 GB. I am working on a 64 bit machine and my Python is also 64 bit

 >>> import sys >>> sys.maxsize 9223372036854775807 

Is this a bug with python or am I doing something wrong? Are there any good alternatives for compressing and serializing numpy arrays? I'm watching numpy.savez , PyTables, and HDF5 right now, but it would be nice to know why I am having these problems, since I have enough memory


Update: I remember reading somewhere that this could be caused by using the old version of Numpy (and I was), but I completely switched to numpy.save/savez, and actually faster than cPickle (at least in my case)

+6
source share
1 answer

This seems to be a bug in python 2.7

https://bugs.python.org/issue23306

When checking the error report, it does not seem to have a pending solution. The best thing would be to upgrade to python 3, which apparently did not detect this error.

+1
source

Source: https://habr.com/ru/post/987674/


All Articles