A greedy match with a negative look in regular expression

Question

A greedy match with a negative look in regular expression

I have a regular expression in which I try to extract every group of letters that are not immediately followed by the symbol “(”. For example, the following regular expression works with a mathematical formula that includes the names of variables (x, y and z) and the names of functions ( movav and movsum), both of which consist entirely of letters, but where the function names are followed by "(".

re.findall("[a-zA-Z]+(?!\()", "movav(x/2, 2)*movsum(y, 3)*z")

I would like the expression to return an array

 ['x', 'y', 'z']

but instead returns an array

 ['mova', 'x', 'movsu', 'y', 'z']

I understand why the regex will return a second result, but is there a way to change it to return an array ['x', 'y', 'z'] ?

+6

python

Abiel Nov 03 '11 at 17:49

source share

4 answers

Add word-delimiter \b :

 >>> re.findall(r'[a-zA-Z]+\b(?!\()', "movav(x/2, 2)*movsum(y, 3)*z") ['x', 'y', 'z']

\b matches an empty line between two words, so now you are looking for letters, followed by the word boundary, which immediately follows ( . For more information, see re docs .

+3

Dougal Nov 03 '11 at 17:57

source share

You need to limit the match to whole words. So use \b to match the beginning or end of a word:

 re.findall(r"\b[a-zA-Z]+\b(?!\()", "movav(x/2, 2)*movsum(y, 3)*z")

+1

ekhumoro Nov 03 '11 at 17:57

source share

Alternative approach: find lines of letters followed by either the end of the line or a non-letter character without an anchor; then write down part of the letter.

 re.findall("([a-zA-Z]+)(?:[^a-zA-Z(]|$)", "movav(x/2, 2)*movsum(y, 3)*z")

+1

Karl Knechtel Nov 03 '11 at 18:13

source share

taleinat · Accepted Answer · 2011-11-03T18:11:36+0000

Another solution that does not depend on word boundaries:

Make sure that the letters are not followed by either ( or another letter.

 >>> re.findall(r'[a-zA-Z]+(?![a-zA-Z(])', "movav(x/2, 2)*movsum(y, 3)*z") ['x', 'y', 'z']

A greedy match with a negative look in regular expression

More articles: