Perl 6 Regular Expression Frequency

Question

Perl 6 Regular Expression Frequency

Previously, I only worked with regular expressions bash , grep , sed , awk , etc. After trying Perl 6 regexes , I got the impression that they work slower than I would expect, but probably the reason is that I do not handle them correctly. I did a simple test to compare similar operations in Perl 6 and bash . Here is the Perl 6 code:

 my @array = "aaaaa" .. "fffff"; say +@array ; # 7776 = 6 ** 5 my @search = <abcde cdeff fabcd>; my token search { @search } my @new_array = @array.grep({/ <search> /}); say @new_array;

Then I printed @array into a file named array (with 7776 lines), made a file called search with three lines ( abcde , cdeff , fabcd ) and did a simple grep search.

 $ grep -f search array

After both programs produced the same result as expected, I measured their runtime.

 $ time perl6 search.p6 real 0m6,683s user 0m6,724s sys 0m0,044s $ time grep -f search array real 0m0,009s user 0m0,008s sys 0m0,000s

So what am I doing wrong in my Perl 6 code?

UPD: If I pass the search tokens to grep by going through the @search array, the program runs much faster:

 my @array = "aaaaa" .. "fffff"; say +@array ; my @search = <abcde cdeff fabcd>; for @search -> $token { say ~@array.grep ({/$token/}); }

 $ time perl6 search.p6 real 0m1,378s user 0m1,400s sys 0m0,052s

And if I define each search pattern manually, it works even faster:

 my @array = "aaaaa" .. "fffff"; say +@array ; # 7776 = 6 ** 5 say ~@array.grep ({/abcde/}); say ~@array.grep ({/cdeff/}); say ~@array.grep ({/fabcd/});

 $ time perl6 search.p6 real 0m0,587s user 0m0,632s sys 0m0,036s

+4

perl6

Eugene barsky Oct 21 '17 at 19:30

source share

1 answer

Brad gilbert · Accepted Answer · 2017-10-22T02:09:15+0000

The grep much simpler than Perl 6 regular expressions, and there have been many more years to optimize. This is also one of the areas that have not seen such optimization in Rakudo; partly because it is considered a difficult task.

For a more efficient example, you can precompile the regular expression:

 my $search = "/@search.join('|')/".EVAL; # $search = /abcde|cdeff|fabcd/; say ~@array.grep ($search);

This change makes it work in about half a second.

If there is malicious data in @search and you should do this, it might be safer to use:

 "/@search».Str».perl.join('|')/".EVAL

The compiler cannot create such optimized code for /@search/ , since @search may change after compiling the regular expression. What can happen is that the first time a regular expression is used, it is recompiled into a better shape and then caches it until @search changes.
(I think Perl 5 does something similar)

One important fact that you should keep in mind is that the regular expression in Perl 6 is just a method written on a specific domain.

Perl 6 Regular Expression Frequency

More articles: