Is there a Python equivalent for Perl `study`?

Question

Is there a Python equivalent for Perl `study`?

study takes extra time to learn SCALAR ($ _ if not specified) pending execution, many patterns match the string before it is modified. This may or may not save time, depending on the nature and number of patterns you are looking for, and the distribution of the frequency of the characters in the search string;

I am trying to speed up some regular expression parsing that I am doing in Python, and I recalled this Perl trick. I understand that I will have to navigate to determine if there is acceleration, but I can not find an equivalent method in Python.

+4

optimization python regex perl

bonsaiviking Mar 05 '12 at 21:23

source share

2 answers

Learning Perls is really not much more. The compiled regular expression got a whole, much smarter than when the study was created.

For example, he compiles alternatives into a trie structure with the Aho-Corasick prediction.

Run with perl -Mre=debug to see the kinds of skills used by the compiler and the execution engine.

+8

tchrist Mar 05 '12 at 22:15

source share

Dougal · Accepted Answer · 2012-03-05T21:46:52+0000

As far as I know, nothing like this is built into Python. But according to perldoc :

How the study works: a linked list of each character in a string to be searched, so we know, for example, where all the Characters are "k". Of each search string, the rarest character selected based on some static frequency tables, built on some C and English text. Only those places that contain this are considered a "rare" character.

It does not seem very complicated, and you could hack something equivalent to yourself.

esmre is a bit similar. And as @Frg pointed out , you'll want to use re.compile if you reuse one regex (to avoid re-parsing the regex over and over).

Either you can use suffix trees (here is one implementation , or here is a C extension with Unicode support ) or suffix arrays ( implementation ).

Is there a Python equivalent for Perl `study`?

More articles: