Last Perl won't match certain regular expressions longer than 32768 characters

I hope some Guru Pearl can state the following. This is the smallest possible example I could find that reproduces my problem:

>./perl -e 'print (("a".("f"x32767)."a") =~ /a(?:[^a]|bb)*a/)' 1 

but

 >./perl -e 'print (("a".("f"x32768)."a") =~ /a(?:[^a]|bb)*a/)' > 

and I compiled the latest version of Perl from the source to see if it fixes the problem:

 >./perl -v This is perl 5, version 20, subversion 1 (v5.20.1) built for i686-linux 

Is this a mistake (looks to me)?

+6
source share
2 answers

Add use warnings :

 use strict; use warnings; use feature qw(say); say "Version is $^V"; say +("a".("f"x32767)."a") =~ /a(?:[^a]|bb)*a/ ? 'matches' : 'no match'; say +("a".("f"x32768)."a") =~ /a(?:[^a]|bb)*a/ ? 'matches' : 'no match'; 

Outputs:

 Complex regular subexpression recursion limit (32766) exceeded at e.pl line 7. Complex regular subexpression recursion limit (32766) exceeded at e.pl line 9. Version is v5.20.1 matches no match 

From perldiag

Recursive regular subexpression limit exceeded (% d)

(W regexp) The regex engine uses recursion in complex situations where backlink tracking is required. The recursion depth is limited to 32766 or perhaps less in architectures where the stack cannot grow arbitrarily. ("Simple" and "medium" situations are processed without recursion and are not subject to restrictions). Try shortening the string you are examining; a loop in Perl code (for example, with while), and not in the regex engine; or rewrite the regular expression to make it simpler or less backward. (See Perlfaq2 for information on mastering regular expressions.)

+10
source

This is a known bug recorded since 2002, and it has not yet been fixed. Now you know that you are not the first person to encounter this error (or function, as you will see soon).

From this comment in the error report, it seems that the quantifiers ( * , + , {n,m} , {n,} ) are designed to have an upper limit on the number of repetitions, which prevents spontaneous failures when the stack is used to overflow overflow, but violates the very definition of the Kleene operator in a regular expression (repeat the pattern an arbitrary number of times) and gives the wrong answer for request 1 .

1 In contrast, the Java regex mechanism (Oracle implementation) simply allows a StackOverflowError occur for such cases, but the quantifier has an upper limit of 2 32 - 1, which is sufficient for most use cases. And there is a workaround for such cases, which should use possessive quantifiers.

The same comment also prints debug information on regular expression compilation, and the output clearly shows that * been translated to {0,32767} . It also plays on my machine (perl v5.10.1 (*), created for x86_64-linux-thread-multi).

 $ perl -Mre=debug -wce '/(A|B)*/' Compiling REx "(A|B)*" Final program: 1: CURLYM[1] {0,32767} (15) 5: TRIE-EXACT[AB] (13) <A> <B> 13: SUCCEED (0) 14: NOTHING (15) 15: END (0) minlen 0 -e syntax OK Freeing REx: "(A|B)*" 

This next test confirms the problem again and shows that perl does not allow you to specify a repetition that exceeds the limit.

 $ perl -e 'print (("a".("f"x32767)."a") =~ /a(?:[^a]|bb){0,32767}a/)' Quantifier in {,} bigger than 32766 in regex; marked by <-- HERE in m/a(?:[^a]|bb){ <-- HERE 0,32767}a/ at -e line 1. 

Creating a quantifier with *+ does not solve the problem, since the limit still exists:

 $ perl -Mre=debug -wce '/(A|B)*+/' Compiling REx "(A|B)*+" Final program: 1: SUSPEND (19) 3: CURLYM[1] {0,32767} (17) 7: TRIE-EXACT[AB] (15) <A> <B> 15: SUCCEED (0) 16: NOTHING (17) 17: SUCCEED (0) 18: TAIL (19) 19: END (0) minlen 0 -e syntax OK Freeing REx: "(A|B)*+" 
+9
source

Source: https://habr.com/ru/post/976328/


All Articles