How to improve Python regex syntax?

Question

How to improve Python regex syntax?

I am very new to Python and quite new to regex. (I have no Perl experience.)

I can use regular expressions in such a way that this works, but I'm not sure if my code is especially Pythonic or consise.

For example, if I wanted to read in a text file and print the text that appears directly between the words "foo" and "bar" on each line (suppose this happened once or zero times per line), I would write the following:

fileList = open(inFile, 'r')
pattern = re.compile(r'(foo)(.*)(bar)')
for line in fileList:
    result = pattern.search(line)
    if (result != None):
        print result.groups()[1]

Is there a better way? ifyou must avoid calling groups()on None. But I suspect that there is a more concise way to get a matching string, if any, without errors in the absence.

I do not expect Perl to be unreadable. I just want to complete this general task in the simplest and easiest way.

+3

python regex

Eric Wilson Mar 29 '10 at 8:45

source share

4 answers

: - re.finditer ( ). -, mmap.

re.DOTALL , . :

, '.' , .

, (, f.read()), , ( : . ^ $, , re.MULTILINE). , , , , , re.finditer() , ( !). , , finditer():

fileList = open(inFile, 'r')
pattern = re.compile(r'foo(.*)bar')
for result in pattern.finditer(fileList.read()):
    print result.groups(1)

. , . , , . , , ! mmap.

mmap , ( , !), . , :

fileList = open(inFile, 'r+b')
fileS = mmap.mmap(fileList.fileno(), 0)
pattern = re.compile(r'foo(.*)bar')
for result in pattern.finditer(fileS):
    print result.groups(1)

, ().

+1

Devin Jeanpierre 29 . '10 9:08

. "bar", , "foo", "foo" . , , , .

>>> s="w1 w2 foo what i want bar w3 w4 foowhatiwantbar w5"
>>> for item in s.split("bar"):
...     if "foo" in item:
...         print item.split("foo")[1:]
...
[' what i want ']
['whatiwant']

0

ghostdog74 29 . '10 8:58

:

, foo bar , .*? .*
, foo bar ( foonly rebar), \b (\bfoo\b .)
You can use lookaround to match only the match ( (?<=\bfoo\b).*?(?=\bbar\b)) itself, so it result.group(0)will now contain a match. But it is no more readable :)

0

Tim pietzcker Mar 29 '10 at 9:10

source share

kennytm · Accepted Answer · 2010-03-29T08:53:38+0000

I think everything is in order.

Some minor points: -

You can replace result.groups()[x]with result.group(x+1).
If you do not need to record fooand bar, just use r'foo(.*)bar'.
If you are using Python 2.5+, try using the statementwith so that even with an exception, you can close the file properly.

BTW as a 5-liner (not I recommend this):

import re
pattern = re.compile(r'foo(.*)bar')
with open(inFile, 'r') as fileList:
  searchResults = (pattern.search(line) for line in fileList)
  groups = (result.group(1) for result in searchResults if result is not None)
  print '\n'.join(groups)

How to improve Python regex syntax?

More articles: