The strange problem is "BadZipfile: Bad CRC-32"

This code simplifies the code in the Django application, which receives the downloaded zip file through an HTTP multi-page POST and performs data processing inside:

#!/usr/bin/env python import csv, sys, StringIO, traceback, zipfile try: import io except ImportError: sys.stderr.write('Could not import the `io` module.\n') def get_zip_file(filename, method): if method == 'direct': return zipfile.ZipFile(filename) elif method == 'StringIO': data = file(filename).read() return zipfile.ZipFile(StringIO.StringIO(data)) elif method == 'BytesIO': data = file(filename).read() return zipfile.ZipFile(io.BytesIO(data)) def process_zip_file(filename, method, open_defaults_file): zip_file = get_zip_file(filename, method) items_file = zip_file.open('items.csv') csv_file = csv.DictReader(items_file) try: for idx, row in enumerate(csv_file): image_filename = row['image1'] if open_defaults_file: z = zip_file.open('defaults.csv') z.close() sys.stdout.write('Processed %d items.\n' % idx) except zipfile.BadZipfile: sys.stderr.write('Processing failed on item %d\n\n%s' % (idx, traceback.format_exc())) process_zip_file(sys.argv[1], sys.argv[2], int(sys.argv[3])) 

Pretty simple. We open the zip file and one or two CSV files inside the zip file.

Which is strange if I ran this with a large zip file (~ 13 MB) and did it ZipFile from StringIO.StringIO or io.BytesIO (maybe something other than a simple file name? Similar problems in the Django application when trying to create ZipFile from TemporaryUploadedFile or even an object file created by calling os.tmpfile() and shutil.copyfileobj() ) and make it open TWO csv files, not just one, at the end of processing. Here's the output I see on a Linux system:

 $ ./test_zip_file.py ~/data.zip direct 1 Processed 250 items. $ ./test_zip_file.py ~/data.zip StringIO 1 Processing failed on item 242 Traceback (most recent call last): File "./test_zip_file.py", line 26, in process_zip_file for idx, row in enumerate(csv_file): File ".../python2.7/csv.py", line 104, in next row = self.reader.next() File ".../python2.7/zipfile.py", line 523, in readline return io.BufferedIOBase.readline(self, limit) File ".../python2.7/zipfile.py", line 561, in peek chunk = self.read(n) File ".../python2.7/zipfile.py", line 581, in read data = self.read1(n - len(buf)) File ".../python2.7/zipfile.py", line 641, in read1 self._update_crc(data, eof=eof) File ".../python2.7/zipfile.py", line 596, in _update_crc raise BadZipfile("Bad CRC-32 for file %r" % self.name) BadZipfile: Bad CRC-32 for file 'items.csv' $ ./test_zip_file.py ~/data.zip BytesIO 1 Processing failed on item 242 Traceback (most recent call last): File "./test_zip_file.py", line 26, in process_zip_file for idx, row in enumerate(csv_file): File ".../python2.7/csv.py", line 104, in next row = self.reader.next() File ".../python2.7/zipfile.py", line 523, in readline return io.BufferedIOBase.readline(self, limit) File ".../python2.7/zipfile.py", line 561, in peek chunk = self.read(n) File ".../python2.7/zipfile.py", line 581, in read data = self.read1(n - len(buf)) File ".../python2.7/zipfile.py", line 641, in read1 self._update_crc(data, eof=eof) File ".../python2.7/zipfile.py", line 596, in _update_crc raise BadZipfile("Bad CRC-32 for file %r" % self.name) BadZipfile: Bad CRC-32 for file 'items.csv' $ ./test_zip_file.py ~/data.zip StringIO 0 Processed 250 items. $ ./test_zip_file.py ~/data.zip BytesIO 0 Processed 250 items. 

By the way, the code does not work under the same conditions, but it works differently on my OS X system. Instead of BadZipfile , it seems to read corrupted data and is very confused.

This all tells me that I am doing something in this code that you should not do - for example: call zipfile.open in a file, while it already has another file in the same zip file object? This doesn't seem to be a problem when using ZipFile(filename) , but maybe it is problematic when passing a ZipFile file-like object due to some implementation details in the ZipFile module?

Perhaps I missed something in the ZipFile ? Or maybe this is not yet documented? Or (least likely) an error in the ZipFile module?

+5
source share
3 answers

Perhaps I just found a problem and solution, but unfortunately I had to replace the Python zipfile module with a hacked one of my own (called myzipfile here).

 $ diff -u ~/run/lib/python2.7/zipfile.py myzipfile.py --- /home/msabramo/run/lib/python2.7/zipfile.py 2010-12-22 17:02:34.000000000 -0800 +++ myzipfile.py 2011-04-11 11:51:59.000000000 -0700 @@ -5,6 +5,7 @@ import binascii, cStringIO, stat import io import re +import copy try: import zlib # We may need its compression method @@ -877,7 +878,7 @@ # Only open a new file for instances where we were not # given a file object in the constructor if self._filePassed: - zef_file = self.fp + zef_file = copy.copy(self.fp) else: zef_file = open(self.filename, 'rb') 

The problem with the standard zipfile module is that when transferring a file object (not a file name), it uses the same transferred file object for every call to the open method. This means that tell and seek called in one file, so an attempt to open several files in a zip file causes the file position to be divided, and therefore numerous open calls cause them to go through each of them Others. In contrast, passing the file name open opens a new file object. My solution for the case when the file object is transferred, instead of directly using this file object, I create a copy of it.

This change on zipfile fixes the problems that I saw:

 $ ./test_zip_file.py ~/data.zip StringIO 1 Processed 250 items. $ ./test_zip_file.py ~/data.zip BytesIO 1 Processed 250 items. $ ./test_zip_file.py ~/data.zip direct 1 Processed 250 items. 

but I don’t know if it has other negative effects on the zipfile ...

EDIT: I just found a mention of this in Python docs that I previously overlooked. The http://docs.python.org/library/zipfile.html#zipfile.ZipFile.open says:

Note. . If a ZipFile was created by passing a constructor as the first argument to a file-like object, then the object returned by open() separates the ZipFiles file pointer. Under these circumstances, the object returned by open() should not be used after any additional operations are performed on the ZipFile object. If the ZipFile was created by passing in the string (filename) as the first argument to the constructor, then open() will create a new file object that will be stored by ZipExtFile, allowing it to work independently of the ZipFile.

+10
source

what i did is tools to configure updates and then download and now it works

https://pypi.python.org/pypi/setuptools/35.0.1

+1
source

In my case, this solved the problem:

 pip uninstall pillow 
0
source

Source: https://habr.com/ru/post/1347587/


All Articles