Other theoretical information about possible interest.
For clarity, suppose the standard definition of a regular expression
http://en.wikipedia.org/wiki/Regular_language
from the theory of formal language. In practice, this means that the only building material is the symbols of the alphabet, the operators of concatenation, rotation and Kleene, along with the unit and zero constants (which appear for group-theoretical reasons). As a rule, it is a good idea not to overload this term despite everyday practice in scripting languages, which leads to ambiguity.
There is an NFA construct that solves the correspondence problem for the regular expression r and the input text t in O (| r | | t |) time and O (| r |) space, where | - | is a function of length. This algorithm has been further improved by Myers.
http://doi.acm.org/10.1145/128749.128755
to the complexity of time and space O (| r | | t | / log | t |), using automata node lists and the paradigm of four Russians. This paradigm seems to be named after four Russian guys who wrote an innovative newspaper that is not online. However the paradigm is illustrated in this computational biology lecture notes
http://lyle.smu.edu/~saad/courses/cse8354/lectures/lecture5.pdf
Itβs funny for me to call the paradigm the number and nationality of the authors instead of their last names.
The problem of matching regular expressions with added backlinks NP-complete, which was proved by Aho
http://portal.acm.org/citation.cfm?id=114877
short for the vertex-cover problem, which is a classical NP-complete problem.
To reconcile regular expressions with reverse queries deterministically, we could use backtracking (unlike the regex Perl engine) to track possible subwords of the input text t that can be assigned to variables in p. There are only words O (| t | ^ 2) that can be assigned to any variable in r. If r has n variables, then O (| t | ^ 2n) jobs are possible. When assigning substrings to variables is fixed, the problem boils down to simple regular matching of an expression. Hence the worst difficulty in matching regular expressions with backlinks is O (| m | ^ 2n).
Please note that regular expressions with backlinks have not yet installed a fully functional regular expression.
Take, for example, the βdo not careβ symbol, except for any other operators. There are several polynomial algorithms that determine whether a set of patterns exists that match the input text. For example, Kucherov and Rusinovich
http://dx.doi.org/10.1007/3-540-60044-2_46
define the pattern as the word w_1 @ w_2 @ ... @w_n, where each w_i is a word (not a regular expression), and β@β is a variable-length character βdoes not matterβ that is not contained in any of w_i. They get the O ((| t | + | P |) log | P |) algorithm for matching the set of patterns P with the input text t, where | t | is the length of the text, and | P | is the length of all words in P.
It would be interesting to know how these measures of complexity are combined and what is the measure of the complexity of the task of matching regular expressions with backlinks, "don't care" and other interesting features of practical regular expressions.
Alas, I did not say a word about Python ... :)