Python re: saving multiple matches in variables

I want to map different parts of a string and store them in separate variables for later use. For instance,

string = "bunch(oranges, bananas, apples)" rxp = "[az]*\([var1]\, [var2]\, [var3]\)" 

so i have

 var1 = "oranges" var2 = "bananas" var3 = "apples" 

Something like what re.search () does, but for several different parts of the same match.

EDIT: The number of fruits in the list is unknown in advance. I had to raise a question to this question.

+4
source share
4 answers

This is what re.search does. Just use capture groups (parentheses) to access materials that were later mapped to some subpatterns:

 >>> import re >>> m = re.search(r"[az]*\(([az]*), ([az]*), ([az]*)\)", string) >>> m.group(0) 'bunch(oranges, bananas, apples)' >>> m.group(1) 'oranges' >>> m.group(2) 'bananas' >>> m.group(3) 'apples' 

Also note that I used a raw string to avoid double backslash.

If your number of "variables" inside a bunch can vary, you have a problem. Most regex mechanisms cannot write a variable number of lines. However, in this case, you can get away from this:

 >>> m = re.search(r"[az]*\(([az, ]*)\)", string) >>> m.group(1) 'oranges, bananas, apples' >>> m.group(1).split(', ') ['oranges', 'bananas', 'apples'] 
+3
source

If you want, you can use groupdict to store the relevant elements in the dictionary:

 regex = re.compile("[az]*\((?P<var1>.*)\, (?P<var2>.*)\, (?P<var3>.*)") match = regex.match("bunch(oranges, bananas, apples)") if match: match.groupdict() #{'var1': 'oranges', 'var2': 'bananas', 'var3': 'apples)'} 
+4
source

For regular expressions, you can use the match() function to do what you want, and use groups to get your results. Also, do not assign the word string , as this is a built-in function (even if it is deprecated). For your example, if you know that there is always the same amount of fruit every time, it looks like this:

 import re input = "bunch(oranges, bananas, apples)" var1, var2, var3 = re.match('bunch\((\w+), (\w+), (\w+)\)', input).group(1, 2, 3) 

Here I used a special \w sequence that matches any alphanumeric character or underscore, as described in the documentation

If you don’t know the number of fruits in advance, you can use two calls of regular expressions, one to get the minimum part of the string where the fruits are listed, get rid of the β€œbunch” and parentheses, then finditer to extract the fruit names:

 import re input = "bunch(oranges, bananas, apples)" [m.group(0) for m in re.finditer('\w+(, )?', re.match('bunch\(([^)]*)\)', input).group(1))] 
+1
source

not to do. Every time you use var1, var2, etc., you really need a list. Unfortunately, this is not a way to collect an arbitrary number of subgroups in a list using findall , but you can use the hack like this:

 import re lst = [] re.sub(r'([az]+)(?=[^()]*\))', lambda m: lst.append(m.group(1)), string) print lst # ['oranges', 'bananas', 'apples'] 

Note that this works not only for this particular example, but for any number of substrings.

+1
source

Source: https://habr.com/ru/post/1446821/


All Articles