Is there a Python equivalent for Perl `study`?

From the Perl documentation :

study takes extra time to learn SCALAR ($ _ if not specified) pending execution, many patterns match the string before it is modified. This may or may not save time, depending on the nature and number of patterns you are looking for, and the distribution of the frequency of the characters in the search string;

I am trying to speed up some regular expression parsing that I am doing in Python, and I recalled this Perl trick. I understand that I will have to navigate to determine if there is acceleration, but I can not find an equivalent method in Python.

+4
source share
2 answers

As far as I know, nothing like this is built into Python. But according to perldoc :

How the study works: a linked list of each character in a string to be searched, so we know, for example, where all the Characters are "k". Of each search string, the rarest character selected based on some static frequency tables, built on some C and English text. Only those places that contain this are considered a "rare" character.

It does not seem very complicated, and you could hack something equivalent to yourself.

esmre is a bit similar. And as @Frg pointed out , you'll want to use re.compile if you reuse one regex (to avoid re-parsing the regex over and over).

Either you can use suffix trees (here is one implementation , or here is a C extension with Unicode support ) or suffix arrays ( implementation ).

+6
source

Learning Perls is really not much more. The compiled regular expression got a whole, much smarter than when the study was created.

For example, he compiles alternatives into a trie structure with the Aho-Corasick prediction.

Run with perl -Mre=debug to see the kinds of skills used by the compiler and the execution engine.

+8
source

Source: https://habr.com/ru/post/1399877/


All Articles