Like Mr. Obama said: "Yes, we can!"
I found a solution that does not require an additional module and handles all possible events of the capture group (as I know). Since Ikegami mentions that he needs regular expression repair, but perl does this for us.
While digging the Perl modules on the CPAN in the haystack, I found a very interesting warnings :: regex :: recompile . It generates a warning message every time regexp is recompiled. Analyzing the source, I found a solution to my problem.
Using use re qw/Debug DUMP/;
Perl returns the parsed regular expression to STDERR
. In the source module, the result is dumped to the real file and then reread for processing. I changed the code to use in memory.
My decision:
sub dumpre { use re qw(eval Debug DUMP); my $buf = ''; open OLDERR, '>&', STDERR or die "$!"; close STDERR or die "$!"; open STDERR, '>', \$buf or die "$!"; my $re = qr/$_[0]/; close STDERR or die "$!"; open STDERR, '>&', OLDERR or die "$!"; close OLDERR or die "$!"; no re 'debug';
This function enables DUMP when compiling a regular expression. Allows eval
process expressions (?{...})
and (??{...})
.
my $re = 'aa(?:(a\d)+x)?((b\d)*d)*c*(d\d)?(e*)((f)+)(g)+'; my $r = dumpre $re; print join "\n", @$r;
Result:
Compiling REx "aa(?:(a\d)+x)?((b\d)*d)*c*(d\d)?(e*)((f)+)(g)+" Final program: 1: EXACT <aa> (3) 3: CURLYX[0] {0,1} (19) 5: CURLYM[1] {1,32767} (16) 9: EXACT <a> (11) 11: POSIXU[\d] (14) 14: SUCCEED (0) 15: NOTHING (16) 16: EXACT <x> (18) 18: WHILEM (0) 19: NOTHING (20) 20: CURLYX[1] {0,32767} (40) 22: OPEN2 (24) 24: CURLYM[3] {0,32767} (35) 28: EXACT <b> (30) 30: POSIXU[\d] (33) 33: SUCCEED (0) 34: NOTHING (35) 35: EXACT <d> (37) 37: CLOSE2 (39) 39: WHILEM[1/7] (0) 40: NOTHING (41) 41: STAR (44) 42: EXACT <c> (0) 44: CURLYM[4] {0,1} (55) 48: EXACT <d> (50) 50: POSIXU[\d] (53) 53: SUCCEED (0) 54: NOTHING (55) 55: OPEN5 (57) 57: STAR (60) 58: EXACT <e> (0) 60: CLOSE5 (62) 62: OPEN6 (64) 64: CURLYN[7] {1,32767} (74) 66: NOTHING (68) 68: EXACT <f> (0) 72: WHILEM (0) 73: NOTHING (74) 74: CLOSE6 (76) 76: CURLYN[8] {1,32767} (86) 78: NOTHING (80) 80: EXACT <g> (0) 84: WHILEM (0) 85: NOTHING (86) 86: END (0) anchored "aa" at 0 floating "fg" at 2..9223372036854775807 (checking floating) minlen 4
Thus, lines with OPEN\d+
, CURLYM[\d+]
, CURLYN[\d+]
show exciting bracket expressions (line syntax: segment_no: regex command (next segment)). (Note: CURLYX is not a capturing bracket expression like (?: ...) +). The number after OPEN / CURLY [MN} indicates the sequence number of the capture group. The last to be found. In this case, it is 8.
Unfortunately, it does not process if (??{...})
returns the expression in brackets, but now it is not very important for me. I assume that the format is not fixed, so it may differ from version to version. But this is normal for me.