Python re: saving multiple matches in variables

Question

Python re: saving multiple matches in variables

I want to map different parts of a string and store them in separate variables for later use. For instance,

string = "bunch(oranges, bananas, apples)" rxp = "[az]*\([var1]\, [var2]\, [var3]\)"

so i have

 var1 = "oranges" var2 = "bananas" var3 = "apples"

Something like what re.search () does, but for several different parts of the same match.

EDIT: The number of fruits in the list is unknown in advance. I had to raise a question to this question.

+4

python regex

Arish Nov 18 '12 at 21:16

source share

4 answers

If you want, you can use groupdict to store the relevant elements in the dictionary:

 regex = re.compile("[az]*\((?P<var1>.*)\, (?P<var2>.*)\, (?P<var3>.*)") match = regex.match("bunch(oranges, bananas, apples)") if match: match.groupdict() #{'var1': 'oranges', 'var2': 'bananas', 'var3': 'apples)'}

+4

tehmisvh Nov 18 '12 at 21:33

source share

For regular expressions, you can use the match() function to do what you want, and use groups to get your results. Also, do not assign the word string , as this is a built-in function (even if it is deprecated). For your example, if you know that there is always the same amount of fruit every time, it looks like this:

 import re input = "bunch(oranges, bananas, apples)" var1, var2, var3 = re.match('bunch\((\w+), (\w+), (\w+)\)', input).group(1, 2, 3)

Here I used a special \w sequence that matches any alphanumeric character or underscore, as described in the documentation

If you don’t know the number of fruits in advance, you can use two calls of regular expressions, one to get the minimum part of the string where the fruits are listed, get rid of the “bunch” and parentheses, then finditer to extract the fruit names:

 import re input = "bunch(oranges, bananas, apples)" [m.group(0) for m in re.finditer('\w+(, )?', re.match('bunch\(([^)]*)\)', input).group(1))]

+1

acjay Nov 18 '12 at 21:21

source share

not to do. Every time you use var1, var2, etc., you really need a list. Unfortunately, this is not a way to collect an arbitrary number of subgroups in a list using findall , but you can use the hack like this:

 import re lst = [] re.sub(r'([az]+)(?=[^()]*\))', lambda m: lst.append(m.group(1)), string) print lst # ['oranges', 'bananas', 'apples']

Note that this works not only for this particular example, but for any number of substrings.

+1

georg Nov 18 '12 at 21:22

source share

Martin ender · Accepted Answer · 2012-11-18T21:19:39+0000

This is what re.search does. Just use capture groups (parentheses) to access materials that were later mapped to some subpatterns:

 >>> import re >>> m = re.search(r"[az]*\(([az]*), ([az]*), ([az]*)\)", string) >>> m.group(0) 'bunch(oranges, bananas, apples)' >>> m.group(1) 'oranges' >>> m.group(2) 'bananas' >>> m.group(3) 'apples'

Also note that I used a raw string to avoid double backslash.

If your number of "variables" inside a bunch can vary, you have a problem. Most regex mechanisms cannot write a variable number of lines. However, in this case, you can get away from this:

 >>> m = re.search(r"[az]*\(([az, ]*)\)", string) >>> m.group(1) 'oranges, bananas, apples' >>> m.group(1).split(', ') ['oranges', 'bananas', 'apples']

Python re: saving multiple matches in variables

More articles: