Regular Expression Uncertainty

I have a regex to validate valid identifiers in a script language. They begin with a letter or underscore, and can be followed by 0 or more letters, underscores, numbers, and symbols. However, if I call

Util.IsValidIdentifier( "hello\n" ); 

it returns true. My regex

 const string IDENTIFIER_REGEX = @"^[A-Za-z_][A-Za-z0-9_\$]*$"; 

so how does "\ n" go?

+4
source share
3 answers

$ matches the end of lines. You must use \z to match the end of the text with RegexOptions.Multiline . You can also use \A instead of ^ to match the beginning of the text, not the line.

In addition, you do not need to hide $ in the character class.

+5
source

Because $ is a valid metacharacter that means the end of a line (or the end of a line, immediately before a new line). From msdn:

$: a match must occur at the end of a line or before \ n at the end of a line or line.

You should avoid this: \$ (and add \z if you want to match the end of the line there).

+1
source

Your result is true with hello\n , because you do not need to hide $ inside the character class, so the backslash matches, because you have a backslash (treated like a literal) inside the character class.

Try the following:

 const string IDENTIFIER_REGEX = @"^[A-Za-z_][A-Za-z0-9_$]*$"; 

Since you are checking the names of variables that are on the same line, you can use $ as the end of the line.

0
source

Source: https://habr.com/ru/post/1488537/


All Articles