Problem with big data compression in python

I have a script in Python to compress a large string:

import zlib def processFiles(): ... s = """Large string more than 2Gb""" data = zlib.compress(s) ... 

When I run this script, I received an error message:

 ERROR: Traceback (most recent call last):#012 File "./../commands/sce.py", line 438, in processFiles#012 data = zlib.compress(s)#012OverflowError: size does not fit in an int 

Some information:

zlib. version = '1.0'

zlib.ZLIB_VERSION = '1.2.7'

 # python -V Python 2.7.3 # uname -a Linux app2 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux # free total used free shared buffers cached Mem: 65997404 8096588 57900816 0 184260 7212252 -/+ buffers/cache: 700076 65297328 Swap: 35562236 0 35562236 # ldconfig -p | grep python libpython2.7.so.1.0 (libc6,x86-64) => /usr/lib/libpython2.7.so.1.0 libpython2.7.so (libc6,x86-64) => /usr/lib/libpython2.7.so 

How to compress big data (over 2 GB) in Python?

+6
source share
3 answers

This is not a problem with RAM. Just zlib or python bindings cannot process data larger than 4 GB.

Divide your data into 4 GB (or smaller chunks) and process each separately.

+1
source

My big data compression function:

 def compressData(self, s): compressed = '' begin = 0 blockSize = 1073741824 # 1Gb compressor = zlib.compressobj() while begin < len(s): compressed = compressed + compressor.compress(s[begin:begin + blockSize]) begin = begin + blockSize compressed = compressed + compressor.flush() return compressed 
+3
source

Try passing it on ...

 import zlib compressor = zlib.compressobj() with open('/var/log/syslog') as inputfile: data = compressor.compress(inputfile.read()) print data 
0
source

Source: https://habr.com/ru/post/969884/


All Articles