Is there a python module for matching regular expressions in zip files

Question

Is there a python module for matching regular expressions in zip files

I have over a million text files compressed into 40 ZIP files. I also have a list of 500 phone model names. I want to know the number of cases when a particular model was mentioned in text files.

Is there any python module that can do regular file comparisons without unpacking it. Is there an easy way to solve this problem without unpacking?

+4

python regex text-processing zip

cnu Aug 18 '08 at 7:41

source share

4 answers

You can scroll through zip files by reading individual files using the zipfile module and running a regular expression on them, eliminating the ability to unzip all files at once.

I am sure that you cannot run regex on encrypted data, at least not making sense.

0

jeremiahd Aug 18 '08 at 8:06

source share

To access the contents of a zip file, you need to unzip it, although the zipfile package makes it quite simple, since you can unzip each file in the archive individually.

Pipon zipfile module

0

Craig.Nicol Aug 18 '08 at 8:10

source share

Is it possible (at least theoretically) to read in Huffman ZIP encoding and then translate the regular expression into Huffman code? Could this be more efficient than compressing the data first and then running the regex?

(Note: I know that it will not be so simple: you will also have to deal with other aspects of encoding a ZIP layout, file layout, block structures, backlinks and mdash, but you can imagine that it can be quite easy.)

EDIT: Also note that it is probably much wiser to use a zipfile solution.

0

Chris conway Sep 03 '08 at 14:42

source share

Mark harrison · Accepted Answer · 2008-08-18T08:19:06+0000

There is nothing that will automatically do what you want.

However, there is a python zipfile module that will simplify the work. Here's how to iterate over lines in a file.

#!/usr/bin/python import zipfile f = zipfile.ZipFile('myfile.zip') for subfile in f.namelist(): print subfile data = f.read(subfile) for line in data.split('\n'): print line

Is there a python module for matching regular expressions in zip files

More articles: