Does the regex match as much as possible?

Is there a convenient way to write a regular expression that will try to match as many regular expressions as possible?

Example:

my $re = qr/a ([az]+) (\d+)/; match_longest($re, "a") => () match_longest($re, "a word") => ("word") match_longest($re, "a word 123") => ("word", "123") match_longest($re, "a 123") => () 

That is, $re is considered a sequence of regular expressions, and match_longest tries to match such a sequence. In a sense, a match never fails - it is just a question of how many matches succeeded. After an unsuccessful regular expression match, undef for parts that do not match.

I know that I can write a function that takes a sequence of regular expressions and creates one regular expression for the match_longest job. Here's an idea diagram:

Suppose you have three regular expressions: $r1 , $r2 and $r3 . The only regular expression to complete the match_longest job will have the following structure:

 $r = ($r1 $r2 $r3)? | $r1 ($r2 $r3) | $r1 $r2 $r3? 

Unfortunately, this is quadratic in the number of regular expressions. Is it possible to be more effective?

+6
source share
3 answers

You can use regex

 $r = ($r1 ($r2 ($r3)?)?)? 

containing each regular expression only once. You can also use non-capturing groups (?:...) in this example so as not to interfere with your original regular expressions.

+5
source

If I understand the question, should I use nested groups with ? :

 my $re = qr/a ((\w+) (\d+)?)?/; 
+2
source

This particular case can be written as follows:

 m/a (?:(\w+)(?: (\d+))?)?/ 
0
source

Source: https://habr.com/ru/post/895019/


All Articles