The value of the question mark and x in the regex group

I am trying to learn the Atom / Grammar syntax highlighting rules that make heavy use of JS regular expressions and come across an unfamiliar pattern in the python grammar file .

The pattern begins with (?x) , which for me is an unfamiliar regular expression. I looked it up in an online regexp tester , which seems to be invalid. My initial thought was that he was an optional left guy, but I think you need to escape here.

Does this only matter in Atom coffeescript grammar, or do I not see the point in the regex?

(This template is also displayed in the textmate language file, which I assume received Atom).

+5
source share
2 answers

If this regular expression is processed in Python, it will be compiled with the 'verbose' flag.

From Python re docs :

(? aiLmsux)

(One or more letters from the set "a", "i", "L", "m", "s", 'u', 'x' .) The group corresponds to an empty string; the letters set the corresponding flags : re.A (only for ASCII matching), re.I (ignore case), re.L (depending on the locale), re.M (multi-line), re.S (the dot matches all), and re.X (verbose) , for the entire regex. (The flags are described in the module Contents.) This is useful if you want to include flags as part of the regular expression instead of passing the flag argument to the re.compile () function.

+2
source

The JavaScript regex engine does not support the VERBOSE x modifier, neither built-in nor regular.

See Free Spacing: x (except JavaScript) at rexegg.com:

By default, any space in the regular expression string indicates the character to match. In languages ​​where you can write regular expression strings on multiple lines, line breaks also indicate literals to be matched. Since you cannot insert spaces to separate groups that have different meanings (as you do between phrases and paragraphs when writing in English), a regular expression can become difficult to read ...

Fortunately, many engines support a free space mode that allows you to aerate your regular expression. For example, you can add spaces between tokens.

You can also see that this is called space mode, comment mode, or verbose .

Here's how it might look in Python :

 import re regex = r"""(?x) \d+ # Digits \D+ # Non-digits up to... $ # The end of string """ print(re.search(regex, "My value: 56%").group(0)) # => 56% 
+1
source

Source: https://habr.com/ru/post/1232278/


All Articles