Linux regex matching chracter ß

I encounter what I do not see on Linux. Can someone tell me why the first regular expression does not collect ß-carotene?

$ cat cmpg ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((3R)-1H-pyrazole-1-propanenitrile ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((R)-1H-pyrazole-1-propanenitrile ß-carotene $ cat cmpg|awk '/[^\w\s({)}\r\n\[\]],/' ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((3R)-1H-pyrazole-1-propanenitrile ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((R)-1H-pyrazole-1-propanenitrile cat cmpg|awk '/ß/' ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((3R)-1H-pyrazole-1-propanenitrile ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((R)-1H-pyrazole-1-propanenitrile ß-carotene 

Thanks for the help!

+4
source share
2 answers
 $ cat cmpg|awk '/[^\w\s({)}\r\n\[\]],/' 

matches only lines containing at least one comma.

As for why the negated character class matches 2 (which puzzled me because \w contains all ASCII digits, so [^\w...] should not match 2 ): awk uses the main POSIX regular expressions that I don't know stacks \w (or \s ). Instead, you will need to use [:alnum:] or [:space:] .

In general, this regular expression is strange with any regular expression. What are you trying to achieve with him?

+8
source
 $ cat cmpg|awk '/[^\w\s({)}\r\n\[\]],/' 

searches for any string that has 2 characters:

  • first character shoud NOT ( [^ ):

    • \w : word character (numbers, alphanumeric and underscore)
      • OR liter w if this awk version does not know about \w special meaning
    • \s : space (there can be many things if using unicode, and not just space and tab)
      • OR liter s if this awk version does not know about \s special meaning
    • ( : a (
    • { : a {
    • ) : a )
    • } : a }
    • \r : linefeed
    • \n : new line
    • \[ : a [
    • \] : a ]
  • The 2nd character should be:

    • , : a , (comma).

The last line does NOT contain a comma. (Beta will match, otherwise, since it is not on the list above)

+3
source

Source: https://habr.com/ru/post/1486639/


All Articles