Why doesn't this recursive regular expression capture the entire block of code?

I am trying to write a recursive regular expression to capture blocks of code, but for some reason it seems like they are not grabbing them properly. I would expect the code below to capture the entire element of a function, but instead, it captures the contents of the first statement if.

It almost .+?somehow absorbs the first {, but it should not be greedy, so I don’t understand why it would be.

What makes him act this way?

Script:

use strict;
use warnings;

my $text = << "END";
int max(int x, int y)
{
    if (x > y)
    {
        return x;
    }
    else
    {
        return y;
    }
}
END

# Regular expression to capture balanced "{}" groups
my $regex = qr/
    \{              # Match opening brace
        (?:         # Start non-capturing group
            [^{}]++ #     Match non-brace characters without backtracking
            |       #     or
            (?R)    #     Recursively match the entire expression
        )*          # Match 0 or more times
    \}              # Match closing brace
/x;

# is ".+?" gobbling up the first "{"?
# What would cause it to do this?
if ($text =~ m/int\s.+?($regex)/s){
    print $1;
}

Conclusion:

{
        return x;
    }

Expected Result:

{
    if (x > y)
    {
        return x;
    }
    else
    {
        return y;
    }
}

I know that exists for this purpose Text::Balanced, but I try to do it manually to learn more about regular expressions.

+4
2

(?R) , ? $regex /int\s.+?($regex)/, (?R) . , .

, . $regex

/(?<nestedbrace> ... (?&nestedbrace) ...)/

, (?(DEFINE) ...) , :

my $define_nestedbrace_re = qr/(?(DEFINE)
  (?<nestedbrace ... (?&nestedbrace) ...)
)/x;

: /int\s.+?((?&nestedbrace))$define_nestedbrace_re/

. , , . , , .

+6

:

/int\s+.*?  (
    \{              # Match opening brace
        (?:         # Start non-capturing group
            [^{}]++ # Match non-brace chars without backtracking
            |       # OR
            (?-1)   # Recursively match the previous group
        )*          # Match 0 or more times
    \}
)/sx
  • (?-1) (?R), .
  • (?-1) - .

​​ RegEx

+1

Source: https://habr.com/ru/post/1685311/


All Articles