How to match a regular expression with a grouping with an unknown number of groups

Question

How to match a regular expression with a grouping with an unknown number of groups

I want to execute a regular expression (in Python) in the program output log. The log contains some lines that look like this:

... VALUE 100 234 568 9233 119 ... VALUE 101 124 9223 4329 1559 ...

I would like to capture the list of numbers that occurs after the first drop of a line starting with VALUE. those. I want it to return ('100','234','568','9233','119') . The problem is that I do not know in advance how many rooms there will be.

I tried using this as a regular expression:

 VALUE (?:(\d+)\s)+

This matches the string, but only captures the last value, so I just get ('119',).

+17

python regex

Lorin Hochstein Sep 10 '09 at 20:06

source share

5 answers

 >>> import re >>> reg = re.compile('\d+') >>> reg.findall('VALUE 100 234 568 9233 119') ['100', '234', '568', '9223', '119']

This does not confirm that the keyword "VALUE" appears at the beginning of the line, and it does not confirm that there is exactly one space between the elements, but if you can do it as a separate step (or if you do not need it at all), then it will find all sequences of numbers on any line.

+9

Ian Clelland Sep 10 '09 at 20:17

source share

You can only run the primary regular expression, and then run the secondary regular expression in these matches to get numbers:

 matches = Regex.Match(log) foreach (Match match in matches) { submatches = Regex2.Match(match) }

This is, of course, also if you do not want to write a complete parser.

+2

Chris J Sep 10 '09 at 20:14

source share

Another option not described here is the presence of a group of optional capture groups.

 VALUE *(\d+)? *(\d+)? *(\d+)? *(\d+)? *(\d+)? *$

This regular expression captures up to five-digit groups, separated by spaces. If you need more potential groups, just copy and paste more blocks *(\d+)? .

+2

Scottmas Apr 24 '17 at 14:34 on

source share

I had the same problem, and my solution was to use two regular expressions: the first to match the entire group of interest to me, and the second to parse subgroups. For example, in this case, I would start with this:

 VALUE((\s\d+)+)

This should lead to three matches: [0] the entire line, [1] material after the value [2] of the last space + value.

[0] and [2] can be ignored, and then [1] can be used with the following:

 \s(\d+)

Note: these regular expressions have not been tested, hope you get this idea.

The reason Greg answer doesn't work for me is because the second part of the parsing is more complicated, and not just some numbers separated by a space.

However, I would honestly go with Greg's decision on this issue (this is probably more efficient).

I am just writing this answer if someone is looking for a more complex solution as I needed.

0

Christian Nov 12 '17 at 16:14

source share

Greg Hewgill · Accepted Answer · 2009-09-10 20:12

What you are looking for is a parser, not a regular expression. In your case, I would consider using a very simple parser, split() :

 s = "VALUE 100 234 568 9233 119" a = s.split() if a[0] == "VALUE": print [int(x) for x in a[1:]]

You can use regex to see if your input line matches the expected format (using regex in your question), then you can run the above code without checking "VALUE" and knowing that the int(x) conversion will always succeed , since you have already confirmed that the following groups of characters are all numbers.

How to match a regular expression with a grouping with an unknown number of groups

More articles: