Grep and regex - why am I avoiding curly braces?

I am deeply puzzled by how grep seems to parse the regex:

$ echo "@NS500287" | grep '^@NS500[0-9]{3}' #nothing $ echo "@NS500287" | grep '^@NS500[0-9]\{3\}' @NS500287 

This is not true. Why am I avoiding curly braces that are part of the β€œmatch the previous, N times” (and not, say, square brackets)?

Should I not avoid necessity only when I write a regular expression that actually matches { and } as alphabetic characters in the query string?

More cri de coeur than anything else, but I'm curious to answer.

+6
source share
3 answers

This is because {} are special characters, and they must be handled differently in order to have this special behavior. Otherwise, they will be treated as the letter { and } .

You can either escape like you:

 $ echo "@NS500287" | grep '^@NS500[0-9]\{3\}' @NS500287 

or use grep -E :

 $ echo "@NS500287" | grep -E '^@NS500[0-9]{3}' @NS500287 

Without processing:

 $ echo "he{llo" | grep "{" he{llo 

From man grep :

-E , --extended-regexp

Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)

...

REGULAR EXPRESSIONS

A regular expression is a pattern that describes a set of strings. Regular expressions are constructed similarly to expression arithmetic, using different operators to combine smaller expressions.

grep understands three different versions of the regular expression syntax: "main", "advanced" and "perl". In GNU grep, there is no difference in the available functions between the main and advanced syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; The differences for the main regular expressions are summarized after that. Perl regular expressions provide additional functionality, and are documented in pcresyntax (3) and pcrepattern (3), but may not be available on every system.

...

Basic and extended regular expressions

In basic regular expressions, the metacharacters ?, +, {, |, (, and) lose their special meaning; use inverse \? characters instead , \+ , \{ , \| , \( and \) .

+6
source

The answer relates to the difference between basic regular expressions (BRE) and advanced (ERE).

  • In BRE mode (that is, when you call grep without an argument to indicate otherwise), { and } interpreted as alphabetic characters. Exiting them with \ means that they should be interpreted as the number of instances of the previous pattern.

  • If you used grep -E instead (ERE mode), you could use { and } without escaping to refer to the account. In ERE mode, shielding braces makes them interpret literally instead.

+5
source

Instead of this

 echo '@NS500287' | egrep '^@NS500[0-9]{3}' # ^ # / # notice --- 
0
source

Source: https://habr.com/ru/post/977838/


All Articles