Why is the underscore (_) not considered a non-word character?

Why is the underscore character (_) not considered a character without a word? This regexp \ W matches all non-word characters but not underscores.

+4
source share
4 answers

Referring to Jeffrey Friedl's book on regular expressions, this was a change in Perl's regular expressions, originally. Back to 1988, according to the characters that are allowed to name the Perl variable [Page 89]:

Perl 2 1988 . , Spencer, . , | . ​​ \d \s, \w , , , Perl.

+4

\W [^A-Za-z0-9_].

\W, [A-Za-z0-9_] " ".

, . "" . ( ) , (_) .

+2

In accordance with regex101: \W matches any non-word character (equal to [^a-zA-Z0-9_]). It seems to be the choice of designers.

-1
source

The definition of the Word symbol is based on symbols that can be used as part of an identifier in many programming languages, namely [A-Za-z0-9 _].

-1
source

Source: https://habr.com/ru/post/1695467/


All Articles