What is the purpose of a passive (non-capturing) group in Javascript regex?

What is the purpose of a passive group in javascript regex?

A passive group is preceded by a colon with a question mark: (?:group)

In other words, these two things seem the same:

 "hello world".match(/hello (?:world)/) "hello world".match(/hello world/) 

In what situations do you need a group without capture and why?

+6
source share
5 answers

Only one difference from “normal” (capturing) groups is not captured: they do not require the regular expression engine to remember that they matched.

The use case is that sometimes you have to (or should) use a group not because you are interested in what it captures, but for syntactic reasons. In these situations, it makes sense to use a non-capture group instead of a “standard” capture because it is less resource intensive, but if you don't care, the capture group will behave the same way.

Your specific example is not suitable for using non-capturing groups precisely because the two expressions are identical. A better example would be:

 input.match(/hello (?:world|there)/) 
+9
source

Two options for using capture groups

The capture group in regular expression actually has two different goals (like the “capture group” hint itself):

  • Grouping - if you need a group that will be considered as a single entity in order to apply some things to the whole group.

    Probably the most trivial example includes an optional sequence of characters, for example. "foo" is optionally followed by "bar" in terms of the regular expression: /foo(bar)?/ (capture group) or /foo(?:bar)?/ (group without capture). Please note that the final ? applies to the whole group (bar) (which in this case consists of a simple sequence of bar characters). If you just want to check if the input value matches your regular expression, doesn't it really matter if you use a capture group or a non-recording group, they work the same way (except that a non-capturing group is a little faster )

  • Capture - if you need to extract part of the input.

    For example, you want to get the number of rabbits from the entrance, for example, "The farm contains 8 cows and 89 rabbits" (not very good English, I know). The regular expression can be /(\d+)\s*rabbits\b/ . If the match succeeds, you can get the value that the capture group maps to JavaScript code (or any other programming language).

    In this example, you have one capture group, so you can access it through your index 0 (see this answer ).

    Now imagine that you want to make sure that the “place” is called “farm” or “ranch”. If this is not the case, then you do not want to extract the number of rabbits (in regular expressions - you do not want the regular expression to match).

    So, you rewrite your regular expression as follows: /(farm|ranch).*\b(\d+)\s*rabbits\b/ . The regular expression works on its own, but your JavaScript doesn't work - now there are two capture groups, and you have to change your code to get the contents of the second capture group for the number of rabbits (i.e. change the index from 0 to 1). The first group now contains the line "farm" or "ranch", which you were not going to extract.

    A non-repairing group comes to the rescue: /(?:farm|ranch).*\b(\d+)\s*rabbits\b/ . It still corresponds to either a "farm" or a "ranch", but does not fix it, so it does not change the indices of subsequent capture groups. And your JavaScript code works fine without change.


The example may be simplified, but keep in mind that you have a very complex regular expression with many groups, and you want to capture only a few of them. Non-retaining groups are really useful then - you do not need to count all your groups (only exciting).

In addition, non-capture groups are for documentation purposes: for someone reading your code, a non-capture group is a sign that you are not interested in retrieving content, you just want to make sure that it matches.


A few words about sharing problems

Exciting groups are a typical example of a violation of the SoC principle . This syntactic construction serves two different purposes. As the problems became apparent, an additional construct ( ?: Was introduced to disable one of the two symptoms.

It was just a design mistake. Perhaps the lack of "free special characters" played a role ... but it was still a poor design.

Regex is a very old, powerful and widely used concept. For backward compatibility reasons, this flaw is unlikely to be fixed. This is just a lesson on how important separation of concerns is.

+15
source

If you want to apply modifiers to a group.

 /hello (?:world)?/ /hello (?:world)*/ /hello (?:world)+/ /hello (?:world){3,6}/ 

and etc.

+3
source

In addition to the answers above, if you use String.prototype.split() and use a capture group, the output array contains the results ( see MDN ). If you are using a non-capture group that is not running.

 var myString = 'Hello 1 word. Sentence number 2.'; var splits = myString.split(/(\d)/); console.log(splits); 

Outputs:

 ["Hello ", "1", " word. Sentence number ", "2", "."] 

While swapping /(\d)/ for /(?:\d)/ results in:

 ["Hello ", " word. Sentence number ", "."] 
+3
source

Use them when you need a condition, and do not care about which option causes a match.

Lacking groups can simplify the result of matching complex expressions. Here, group 1 is always a name speaker. Without a group that does not contain a capture, the speaker name may appear in group 1 or 2.

/hello (?:world|foobar )?said (.+)/

+2
source

Source: https://habr.com/ru/post/953002/


All Articles