Python unzip extremely slow?

Question

Python unzip extremely slow?

Can someone explain the following secret?

I created a binary file of ~ 37 [MB] size. buttoning it in Ubuntu - using a terminal - took less than 1 [sec]. Then I tried python: its programming (using the zipfile module) took about 1 [sec].

Then I tried to unzip the created zip file. In Ubuntu - using a terminal - this took less than 1 [sec].

In python, the code to unpack (using the zipfile module) took about 37 [sec] to run! any ideas why?

+4

python linux ubuntu unzip zip

user3262424 Feb 14 '11 at 22:16

source share

3 answers

kirpit · Answer 1 · 2011-11-06T13:53:22+0000

I tried to unzip / unzip / extract zip files using Python, as well as "create a ZipFile object, scroll through its .namelist (), read files and write them to the file system" the low-level approach does not seem very Python. Therefore, I started digging zipfile objects , which, in my opinion, are not very well documented and cover all methods of the object:

>>> from zipfile import ZipFile >>> filepath = '/srv/pydocfiles/packages/ebook.zip' >>> zip = ZipFile(filepath) >>> dir(zip) ['NameToInfo', '_GetContents', '_RealGetContents', '__del__', '__doc__', '__enter__', '__exit__', '__init__', '__module__', '_allowZip64', '_didModify', '_extract_member', '_filePassed', '_writecheck', 'close', 'comment', 'compression', 'debug', 'extract', 'extractall', 'filelist', 'filename', 'fp', 'getinfo', 'infolist', 'mode', 'namelist', 'open', 'printdir', 'pwd', 'read', 'setpassword', 'start_dir', 'testzip', 'write', 'writestr']

Here we go to the "extractall" method, like tarfile extractall ! (on python 2.6 and 2.7, but not 2.5)

Then productivity is concerned; the ebook.zip file is 84.6 MB (mostly pdf files), and the uncompressed folder is 103 MB, the default is “Archive Utility” in MacOSx 10.5. So I did the same with the Python timeit module:

 >>> from timeit import Timer >>> t = Timer("filepath = '/srv/pydocfiles/packages/ebook.zip'; \ ... extract_to = '/tmp/pydocnet/build'; \ ... from zipfile import ZipFile; \ ... ZipFile(filepath).extractall(path=extract_to)") >>> >>> t.timeit(1) 1.8670060634613037

which took less than 2 seconds on a heavily loaded machine, in which 90% of the memory is used by other applications.

Hope this helps someone.

jochen · Answer 2 · 2011-03-07T20:25:55+0000

I don’t know what code you use to unzip your file, but the following works for me: after creating the zip archive "test.zip" containing only one file "file1", the following Python script file1 exits from the archive:

 from zipfile import ZipFile, ZIP_DEFLATED zip = ZipFile("test.zip", mode='r', compression=ZIP_DEFLATED, allowZip64=False) data = zip.read("file1") print len(data)

It takes almost no time: I tried the 37 MB input file, which compressed to a 15 megabyte zip archive. In this example, the Python script took 0.346 seconds on my MacBook Pro. Maybe, in your case, 37 seconds were busy with what you did with the data?

Rakesh · Answer 3 · 2011-06-06T13:58:10+0000

Instead of using the python module, we can use the zip function offered by ubuntu in python. I use this because sometimes python zip fails.

 import os filename = test os.system('7z a %s.zip %s'% (filename, filename))

Python unzip extremely slow?

More articles: