Assuming your regex engine supports lookbehinds, atomic groups, and possessing quantifiers (which are PCRE functions):
Some examples of what can be replaced:
all (?: (?>
beginning (the entire first group with a name) :
^(?P<initial>(?>[csz]h?+|[bdfghj-npqrtwxy])?)
this part * by:
|(?<![csz]h)(?<=h)(?>a(?>[io]|ng?+)?|e(?>i|ng?+)?|o(?>u|ng)|u(?>[ino]|a(?>i|ng?+)?)?)
* (i.e.: |(?:(?<!sh|ch|zh)(?<=h)uang|(?<!sh|ch|...|(?<!sh|ch|zh)(?<=h)u) )
* (i.e.: |(?:(?<!r|c|b|d|g|f|h|k|j|m|l|n|q|p|s|t|w|y|x|z)a|(?<!r|c|b|d|...))$ )
How to work with other parts:
Example:
(?:(?<=ch)uang|(?<=ch)ang|(?<=ch)eng|(?<=ch)ong|(?<=ch)uai|(?<=ch)uan|(?<=ch)ai|(?<=ch)an|(?<=ch)ao|(?<=ch)en|(?<=ch)ou|(?<=ch)ua|(?<=ch)ui|(?<=ch)un|(?<=ch)uo|(?<=ch)a|(?<=ch)e|(?<=ch)i|(?<=ch)u)
_ all such parts have the same appearance, you must do these steps for each _
Conclusion
Although the result is shorter, the goal here is optimization. I reduced the number of tests with factorization and the number of back flows using atomic groups and possessive quantifiers.
some limitations
Please note that regular expressions have atomic group functions and possessive quantifiers are not supported by all flavors of regular expressions, but you can fix the problem:
- for fragrances that do not support possessive quantifiers: change
?+ to ? - for fragrances that do not support atomic groups: change
(?> to (?:
(Note that there is a trick to having atomic groups with Python that you can test with a timer to surround the entire template. See this incredible post: Do Python regular expressions are equivalent to Ruby's atomic grouping? )
Some regex engines, such as javascript, do not support lookbehinds. In this case, you should rewrite your entire template using only interleaving (ie | ), which is not so bad as lookbehinds slow down your template; and abandon the named captures, which are also not supported. (In this context, it should be noted that in order to remove negative lookbehind you need to put the syllables described in these parts in front of all the others so that they are matched first.)
other optimization methods
- rewrite your template without lookbehinds and
| instead - sort different lines by most commonly used syllables
source share