This code simplifies the code in the Django application, which receives the downloaded zip file through an HTTP multi-page POST and performs data processing inside:
#!/usr/bin/env python import csv, sys, StringIO, traceback, zipfile try: import io except ImportError: sys.stderr.write('Could not import the `io` module.\n') def get_zip_file(filename, method): if method == 'direct': return zipfile.ZipFile(filename) elif method == 'StringIO': data = file(filename).read() return zipfile.ZipFile(StringIO.StringIO(data)) elif method == 'BytesIO': data = file(filename).read() return zipfile.ZipFile(io.BytesIO(data)) def process_zip_file(filename, method, open_defaults_file): zip_file = get_zip_file(filename, method) items_file = zip_file.open('items.csv') csv_file = csv.DictReader(items_file) try: for idx, row in enumerate(csv_file): image_filename = row['image1'] if open_defaults_file: z = zip_file.open('defaults.csv') z.close() sys.stdout.write('Processed %d items.\n' % idx) except zipfile.BadZipfile: sys.stderr.write('Processing failed on item %d\n\n%s' % (idx, traceback.format_exc())) process_zip_file(sys.argv[1], sys.argv[2], int(sys.argv[3]))
Pretty simple. We open the zip file and one or two CSV files inside the zip file.
Which is strange if I ran this with a large zip file (~ 13 MB) and did it ZipFile from StringIO.StringIO or io.BytesIO (maybe something other than a simple file name? Similar problems in the Django application when trying to create ZipFile from TemporaryUploadedFile or even an object file created by calling os.tmpfile() and shutil.copyfileobj() ) and make it open TWO csv files, not just one, at the end of processing. Here's the output I see on a Linux system:
$ ./test_zip_file.py ~/data.zip direct 1 Processed 250 items. $ ./test_zip_file.py ~/data.zip StringIO 1 Processing failed on item 242 Traceback (most recent call last): File "./test_zip_file.py", line 26, in process_zip_file for idx, row in enumerate(csv_file): File ".../python2.7/csv.py", line 104, in next row = self.reader.next() File ".../python2.7/zipfile.py", line 523, in readline return io.BufferedIOBase.readline(self, limit) File ".../python2.7/zipfile.py", line 561, in peek chunk = self.read(n) File ".../python2.7/zipfile.py", line 581, in read data = self.read1(n - len(buf)) File ".../python2.7/zipfile.py", line 641, in read1 self._update_crc(data, eof=eof) File ".../python2.7/zipfile.py", line 596, in _update_crc raise BadZipfile("Bad CRC-32 for file %r" % self.name) BadZipfile: Bad CRC-32 for file 'items.csv' $ ./test_zip_file.py ~/data.zip BytesIO 1 Processing failed on item 242 Traceback (most recent call last): File "./test_zip_file.py", line 26, in process_zip_file for idx, row in enumerate(csv_file): File ".../python2.7/csv.py", line 104, in next row = self.reader.next() File ".../python2.7/zipfile.py", line 523, in readline return io.BufferedIOBase.readline(self, limit) File ".../python2.7/zipfile.py", line 561, in peek chunk = self.read(n) File ".../python2.7/zipfile.py", line 581, in read data = self.read1(n - len(buf)) File ".../python2.7/zipfile.py", line 641, in read1 self._update_crc(data, eof=eof) File ".../python2.7/zipfile.py", line 596, in _update_crc raise BadZipfile("Bad CRC-32 for file %r" % self.name) BadZipfile: Bad CRC-32 for file 'items.csv' $ ./test_zip_file.py ~/data.zip StringIO 0 Processed 250 items. $ ./test_zip_file.py ~/data.zip BytesIO 0 Processed 250 items.
By the way, the code does not work under the same conditions, but it works differently on my OS X system. Instead of BadZipfile , it seems to read corrupted data and is very confused.
This all tells me that I am doing something in this code that you should not do - for example: call zipfile.open in a file, while it already has another file in the same zip file object? This doesn't seem to be a problem when using ZipFile(filename) , but maybe it is problematic when passing a ZipFile file-like object due to some implementation details in the ZipFile module?
Perhaps I missed something in the ZipFile ? Or maybe this is not yet documented? Or (least likely) an error in the ZipFile module?