Perl 6 Regular Expression Frequency

Previously, I only worked with regular expressions bash , grep , sed , awk , etc. After trying Perl 6 regexes , I got the impression that they work slower than I would expect, but probably the reason is that I do not handle them correctly. I did a simple test to compare similar operations in Perl 6 and bash . Here is the Perl 6 code:

 my @array = "aaaaa" .. "fffff"; say +@array ; # 7776 = 6 ** 5 my @search = <abcde cdeff fabcd>; my token search { @search } my @new_array = @array.grep({/ <search> /}); say @new_array; 

Then I printed @array into a file named array (with 7776 lines), made a file called search with three lines ( abcde , cdeff , fabcd ) and did a simple grep search.

 $ grep -f search array 

After both programs produced the same result as expected, I measured their runtime.

 $ time perl6 search.p6 real 0m6,683s user 0m6,724s sys 0m0,044s $ time grep -f search array real 0m0,009s user 0m0,008s sys 0m0,000s 

So what am I doing wrong in my Perl 6 code?

UPD: If I pass the search tokens to grep by going through the @search array, the program runs much faster:

 my @array = "aaaaa" .. "fffff"; say +@array ; my @search = <abcde cdeff fabcd>; for @search -> $token { say ~@array.grep ({/$token/}); } 
 $ time perl6 search.p6 real 0m1,378s user 0m1,400s sys 0m0,052s 

And if I define each search pattern manually, it works even faster:

 my @array = "aaaaa" .. "fffff"; say +@array ; # 7776 = 6 ** 5 say ~@array.grep ({/abcde/}); say ~@array.grep ({/cdeff/}); say ~@array.grep ({/fabcd/}); 
 $ time perl6 search.p6 real 0m0,587s user 0m0,632s sys 0m0,036s 
+4
source share
1 answer

The grep much simpler than Perl 6 regular expressions, and there have been many more years to optimize. This is also one of the areas that have not seen such optimization in Rakudo; partly because it is considered a difficult task.


For a more efficient example, you can precompile the regular expression:

 my $search = "/@search.join('|')/".EVAL; # $search = /abcde|cdeff|fabcd/; say ~@array.grep ($search); 

This change makes it work in about half a second.

If there is malicious data in @search and you should do this, it might be safer to use:

 "/@search».Str».perl.join('|')/".EVAL 

The compiler cannot create such optimized code for /@search/ , since @search may change after compiling the regular expression. What can happen is that the first time a regular expression is used, it is recompiled into a better shape and then caches it until @search changes.
(I think Perl 5 does something similar)

One important fact that you should keep in mind is that the regular expression in Perl 6 is just a method written on a specific domain.

+7
source

Source: https://habr.com/ru/post/1272885/


All Articles