Perl: "The quantifier in {,} is greater than 32766 in the regular expression"

Let's say I want to find the word “dogs” in a large (300,000 letters) word with exactly 40,000 letters between them. That's why I am:

$mystring =~ m/d.{40000}o.{40000}g.{40000}s/; 

This will work well in other (slower) languages, but in Perl he throws me a "Quantifier in {,} greater than 32766 in regex".

So:

  • Can we use a larger number as a quantifier?
  • If not, is there another good way to find what I want? Note that "dogs" are just an example; I want to do this for any word and any size of jump (and fast).
+6
source share
3 answers

If you really need to do this quickly, I would look for a custom search based on Boyer-Moore line search ideas. The regular expression is parsed in the final state machine. Even the smart, compact presentation of such an FSM will not be a very effective way to perform a search as you describe.

If you really want to continue moving along the lines, you can simply concatenate two expressions of the type .{30000}.{10000} , which in practice coincide with .{40000} .

+9
source

I think index might be better suited for this task. Something along lines completely untested:

 sub has_dogs { my $str = shift; my $start = 0 while (-1 < (my $pos = index $$str, 'd', $start)) { no warnings 'uninitialized'; if ( ('o' eq substr($$str, $pos + 40_000, 1)) and ('g' eq substr($$str, $pos + 80_000, 1)) and ('s' eq substr($$str, $pos + 120_000, 1)) ) { return 1; } } return; } 
+5
source

40,000 = 2 * 20,000

 /d(?:.{20000}){2}o(?:.{20000}){2}g(?:.{20000}){2}s/s 
+5
source

Source: https://habr.com/ru/post/915938/


All Articles