Grep doesn't show results, online regexp tester

I'm pretty inexperienced with grep behavior. I have a bunch of XML files containing lines like this:

<identifier type="abc">abc:def.ghi/g1234.ab012345</identifier>
<identifier type="abc">abc:def.ghi/g5678m.ab678901</identifier>

I wanted to get the part of the identifier after the slash and create a regex using RegexPal :

[a-z]\d{4}[a-z]*\.[a-z]*\d*

He highlights everything that I wanted. Fine. Now when I run grep in the same file, I am not getting any results. And, as I said, I really know little about grep, so I tried all the different combinations.

grep [a-z]\d{4}[a-z]*\.[a-z]*\d* test.xml
grep "[a-z]\d{4}[a-z]*\.[a-z]*\d*" test.xml
egrep "[a-z]\d{4}[a-z]*\.[a-z]*\d*" test.xml
grep '[a-z]\d{4}[a-z]*\.[a-z]*\d*' test.xml
grep -E '[a-z]\d{4}[a-z]*\.[a-z]*\d*' test.xml

What am I doing wrong?

+3
source share
7 answers

Your regular expression does not match the input. Let's figure it out:

  • [az] corresponds to g
  • \d{4} corresponds to 1234
  • [az]* does not match .

, , grep family \d. [0-9] [:digit:]

, , egrep grep. , egrep . , ( bash OS X, , , * , grep ( -). ). Bash .

+8

grep \d defaul. , [0-9] , Perl:

$ grep -P "[a-z]\d{4}[a-z]*\.[a-z]*\d*" test.xml

$ egrep "[a-z][0-9]{4}[a-z]*\.[a-z]*[0-9]*" test.xml
+5

grep "" : ( man-)

Basic vs Extended Regular Expressions
   In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their
   special meaning; instead use the backslashed versions \?, \+, \{,  \|,  \(,  and
   \).

   Traditional  egrep  did  not  support  the  {  meta-character,  and  some  egrep
   implementations support \{ instead,  so  portable  scripts  should  avoid  {  in
   grep -E patterns and should use [{] to match a literal {.

   GNU  grep -E  attempts  to  support  traditional usage by assuming that { is not
   special if it would be the start of  an  invalid  interval  specification.   For
   example,  the  command  grep -E '{1'  searches  for  the two-character string {1
   instead of reporting a syntax error in the regular expression.   POSIX.2  allows
   this behavior as an extension, but portable scripts should avoid it.

, , '*', .

+2

:

$ cat file
<identifier type="abc">abc:def.ghi/g1234.ab012345</identifier>

# Use -P option to enable Perl style regex \d.
$ grep -P  '[a-z]\d{4}[a-z]*\.[a-z]*\d*' file
<identifier type="abc">abc:def.ghi/g1234.ab012345</identifier>

# to get only the part of the input that matches use -o option:
$ grep -P -o '[a-z]\d{4}[a-z]*\.[a-z]*\d*' file
g1234.ab012345

# You can use [0-9] inplace of \d and use -E option.
$ grep -E -o '[a-z][0-9]{4}[a-z]*\.[a-z]*[0-9]*' file
g1234.ab012345
$ 
+1

:

[-]\ {5} [.] [A-Z] {2}\d {6}

0

grep:

[a-z]\d{4}[a-z]*\.[a-z]*\d*
0

, regexp xml/html. . RegEx, XHTML

-1

Source: https://habr.com/ru/post/1775152/


All Articles