Why not create a backlink?

I understand that placing ?: Inside the beginning of the regular expression parentheses will prevent the creation of a backlink, which should be faster. My question is: why do this? Is the speed increase noticeable enough to warrant this consideration? Under what circumstances will it matter so much that you need to carefully skip the backlink every time you are not going to use it. Another disadvantage is that it makes reading, editing, and updating the regular expression much more difficult (if you eventually want to use the backlink later).

So why not worry about creating a backlink?

+4
source share
2 answers

You are right, productivity is not the only reason to avoid capturing groups - in fact, this is not even the most important reason.

Another drawback is that it makes reading, editing, and updating the regular expression difficult (if you eventually want to use the backlink later).

I look at it the other way around: if you usually use groups that are not involved in capturing, it’s easier to keep track of group numbers when you prefer to capture something. In the same vein, if you use named groups (assuming your regular expression flavor supports them), you should always use named groups and always refer to them (in backlinks or replacement strings) by name and not by number. Consistently following these rules will at least partially compensate for the punishment of the readability of groups not associated with the seizure.

Yes, it's PITA to clutter up your regular expressions this way, and people who write / support regular expression implementations are aware of this. In .NET, you can set the ExplicitCapture parameter, in which all bare parentheses are treated as non-capturing groups, and only group names are captured. In Perl 6, parentheses (with or without names) are always captured, and square brackets are used for groups that are not involved in the capture. Other tastes are likely to follow suit, but at the same time, we just have to rely on good habits.

+5
source

I think you are mixing backlinks like \1 and capture groups (...) .

Backreferences prevents all kinds of optimizations, making the language irregular.

Capturing groups makes the regex engine work a little more to remember where the group starts and ends, but not as bad as backlinks.

http://www.regular-expressions.info/brackets.html explains group collection and backlinks to them in detail.

EDIT:

In backlinks that make regular expressions irregular, consider the following regular expression that matches the lua comments:

 /^--(?:\[(=*)\[[\s\S]*?(?:\]\1\]|$)|[^\r\n]*)/ 

So --[[...]] - comment, --[=[...]=] - comment, --[==[...]==] - comment. You can insert comments by adding extra equal characters between square brackets.

This cannot be matched in strictly ordinary language , therefore a simple state machine cannot process it in O (n) time - you need a counter.

Perl 5 regular expressions can handle this with backlinks. But as soon as you need uneven pattern matching, your regular expression library should abandon the simple state-based approach and use more complex and less efficient code.

+13
source

Source: https://habr.com/ru/post/1343572/


All Articles