Grep and regex - why am I avoiding curly braces?

Question

Grep and regex - why am I avoiding curly braces?

I am deeply puzzled by how grep seems to parse the regex:

$ echo "@NS500287" | grep '^@NS500[0-9]{3}' #nothing $ echo "@NS500287" | grep '^@NS500[0-9]\{3\}' @NS500287

This is not true. Why am I avoiding curly braces that are part of the “match the previous, N times” (and not, say, square brackets)?

Should I not avoid necessity only when I write a regular expression that actually matches { and } as alphabetic characters in the query string?

More cri de coeur than anything else, but I'm curious to answer.

+6

bash regex shell grep escaping

Justin St. Giles payne Nov 06 '14 at 15:30

source share

3 answers

The answer relates to the difference between basic regular expressions (BRE) and advanced (ERE).

In BRE mode (that is, when you call grep without an argument to indicate otherwise), { and } interpreted as alphabetic characters. Exiting them with \ means that they should be interpreted as the number of instances of the previous pattern.
If you used grep -E instead (ERE mode), you could use { and } without escaping to refer to the account. In ERE mode, shielding braces makes them interpret literally instead.

+5

Tom fenech Nov 06 '14 at 15:34

source share

Instead of this

 echo '@NS500287' | egrep '^@NS500[0-9]{3}' # ^ # / # notice ---

0

Steven penny Nov 08 '14 at 2:34

source share

fedorqui · Accepted Answer · 2014-11-06T15:32:25+0000

This is because {} are special characters, and they must be handled differently in order to have this special behavior. Otherwise, they will be treated as the letter { and } .

You can either escape like you:

 $ echo "@NS500287" | grep '^@NS500[0-9]\{3\}' @NS500287

or use grep -E :

 $ echo "@NS500287" | grep -E '^@NS500[0-9]{3}' @NS500287

Without processing:

 $ echo "he{llo" | grep "{" he{llo

From man grep :

-E , --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
...
REGULAR EXPRESSIONS
A regular expression is a pattern that describes a set of strings. Regular expressions are constructed similarly to expression arithmetic, using different operators to combine smaller expressions.
grep understands three different versions of the regular expression syntax: "main", "advanced" and "perl". In GNU grep, there is no difference in the available functions between the main and advanced syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; The differences for the main regular expressions are summarized after that. Perl regular expressions provide additional functionality, and are documented in pcresyntax (3) and pcrepattern (3), but may not be available on every system.
...
Basic and extended regular expressions
In basic regular expressions, the metacharacters ?, +, {, |, (, and) lose their special meaning; use inverse \? characters instead , \+ , \{ , \| , \( and \) .

Grep and regex - why am I avoiding curly braces?

More articles: