How to improve Python regex syntax?

I am very new to Python and quite new to regex. (I have no Perl experience.)

I can use regular expressions in such a way that this works, but I'm not sure if my code is especially Pythonic or consise.

For example, if I wanted to read in a text file and print the text that appears directly between the words "foo" and "bar" on each line (suppose this happened once or zero times per line), I would write the following:

fileList = open(inFile, 'r')
pattern = re.compile(r'(foo)(.*)(bar)')
for line in fileList:
    result = pattern.search(line)
    if (result != None):
        print result.groups()[1]

Is there a better way? ifyou must avoid calling groups()on None. But I suspect that there is a more concise way to get a matching string, if any, without errors in the absence.

I do not expect Perl to be unreadable. I just want to complete this general task in the simplest and easiest way.

+3
source share
4 answers

I think everything is in order.

Some minor points: -

  • You can replace result.groups()[x]with result.group(x+1).
  • If you do not need to record fooand bar, just use r'foo(.*)bar'.
  • If you are using Python 2.5+, try using the statementwith so that even with an exception, you can close the file properly.

BTW as a 5-liner (not I recommend this):

import re
pattern = re.compile(r'foo(.*)bar')
with open(inFile, 'r') as fileList:
  searchResults = (pattern.search(line) for line in fileList)
  groups = (result.group(1) for result in searchResults if result is not None)
  print '\n'.join(groups)
+3
source

: - re.finditer ( ). -, mmap.

re.DOTALL , . :

, '.' , .

, (, f.read()), , ( : . ^ $, , re.MULTILINE). , , , , , re.finditer() , ( !). , , finditer():

fileList = open(inFile, 'r')
pattern = re.compile(r'foo(.*)bar')
for result in pattern.finditer(fileList.read()):
    print result.groups(1)

. , . , , . , , ! mmap.

mmap , ( , !), . , :

fileList = open(inFile, 'r+b')
fileS = mmap.mmap(fileList.fileno(), 0)
pattern = re.compile(r'foo(.*)bar')
for result in pattern.finditer(fileS):
    print result.groups(1)

, ().

+1

. "bar", , "foo", "foo" . , , , .

>>> s="w1 w2 foo what i want bar w3 w4 foowhatiwantbar w5"
>>> for item in s.split("bar"):
...     if "foo" in item:
...         print item.split("foo")[1:]
...
[' what i want ']
['whatiwant']
0

:

  • , foo bar , .*? .*
  • , foo bar ( foonly rebar), \b (\bfoo\b .)
  • You can use lookaround to match only the match ( (?<=\bfoo\b).*?(?=\bbar\b)) itself, so it result.group(0)will now contain a match. But it is no more readable :)
0
source

Source: https://habr.com/ru/post/1738927/


All Articles