EOL Special Char does not match

I try to find each pattern "a β†’ b, c, d" in the input line. The sample I'm using is as follows:

"^[ \t]*(\\w+)[ \t]*->[ \t]*(\\w+)((?:,[ \t]*\\w+)*)$" 

This pattern is a C # pattern, "\ t" refers to tabs (its a single torn letter, the intuitive .NET String API), "\ w" refers to the well-known regular language of a predefined class, double escaped to be interpreted as "\ w ".NET STring API, and then as" WORD CLASS "API.NET Regex.

Entrance:

 a -> b b -> c c -> d 

Function:

 private void ParseAndBuildGraph(String input) { MatchCollection mc = Regex.Matches(input, "^[ \t]*(\\w+)[ \t]*->[ \t]*(\\w+)((?:,[ \t]*\\w+)*)$", RegexOptions.Multiline); foreach (Match m in mc) { Debug.WriteLine(m.Value); } } 

Conclusion:

 c -> d 

Actually, there is a problem with a line ending in a $ with a special char. If I insert "\ r" before "$", it works, but I thought that "$" would match any line termination (with the Multiline option), especially \ r \ n in a Windows environment. Is that not so?

+4
source share
3 answers

That surprised me too. In regular expressions, .NET $ does not match before the line separator, it matches before the line - the \n character. This behavior is consistent with Perl's regex flavor, but in my opinion this is still wrong. According to the Unicode standard , $ must match before any of:

\n , \r\n , \r , \x85 , \u2028 , \u2029 , \v or \f

... and never match between \r and \n . Java matches this (except for \v and \f ), but .NET, which came out after Java and whose Unicode support is no worse than Java, only recognizes \n . You think they will at least handle \r\n correctly, given how tightly Microsoft is associated with this line break.

Remember that . follows the same pattern: it does not match \n (if Singleline mode is not set), but it matches \r . If you used .+ Instead of \w+ in your regular expression, you may not have noticed this problem; a carriage return would be included in the match, but the console would ignore it when printing the results.

EDIT: If you want to allow carriage returns without including them in your results, you can replace the anchor with a view: (?=\r?\n

+7
source

Do you mean \t as a regular expression \t or C # \t ? I always use literal string literals with regex:

 @"^[ \t]*(\w+)[ \t]*->[ \t]*(\w+)(,[ \t]*\w+)*$" 

(you only need to to)

+1
source

Usually in C, C ++, C #, lines in a program use "\ n" as a line separator. "\ r \ n" appears only at the I / O level if text translations are enabled.

+1
source

Source: https://habr.com/ru/post/1304609/


All Articles