Regex.Split () weird behavior

I tried the following regex to separate the data in a text file, but during testing I found a strange error - a fairly simple file was skipped clearly incorrectly. Sample code to illustrate this behavior:

const string line = "511525,3122,9,39,2007,9,39,3127,9,39,\" -49,368.11 \",\"-32,724.16\",2,1,\" 2,347.91 \", - ,\" 2,234.17 \", - ,2.2,1.143,2,1.24,FALSE,1,2,0,311,511625"; const string pattern = ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"; Console.WriteLine(); Console.WriteLine("SPLIT"); var splitted = Regex.Split(line, pattern, RegexOptions.Compiled); foreach (var s in splitted) { Console.WriteLine(s); } Console.WriteLine(); Console.WriteLine("REPLACE"); var replaced = Regex.Replace(line, pattern, "!" , RegexOptions.Compiled); Console.WriteLine(replaced); Console.WriteLine(); Console.WriteLine("MATCH"); var matches = Regex.Matches(line, pattern); foreach (Match match in matches) { Console.WriteLine(match.Index); } 

So, as you can see, split is the only method that gives unexpected results (it breaks into invalid positions!)! Both Matches and Replace give absolutely correct results. I even tried checking the specified regex in RegexBuddy and it showed the same matches as Regex.Matches ! Am I missing something or is it like an error in the Split method?

Console exit :

 SPLIT 511525 , - ," 2,234.17 " 3122 , - ," 2,234.17 " 9 , - ," 2,234.17 " 39 , - ," 2,234.17 " 2007 , - ," 2,234.17 " 9 , - ," 2,234.17 " 39 , - ," 2,234.17 " 3127 , - ," 2,234.17 " 9 , - ," 2,234.17 " 39 , - ," 2,234.17 " " -49,368.11 " , - ," 2,234.17 " "-32,724.16" , - ," 2,234.17 " 2 , - ," 2,234.17 " 1 , - ," 2,234.17 " " 2,347.91 " - ," 2,234.17 " - " 2,234.17 " " 2,234.17 " - 2.2 1.143 2 1.24 FALSE 1 2 0 311 511625 REPLACE 511525!3122!9!39!2007!9!39!3127!9!39!" -49,368.11 "!"-32,724.16"!2!1!" 2,347.91 "! - !" 2,234.17 "! - !2.2!1.143!2!1.24!FALSE!1!2!0!311!511625 MATCH 6 11 13 16 21 23 26 31 33 36 51 64 66 68 81 87 100 106 110 116 118 123 129 131 133 135 139 
+4
source share
2 answers

Based on your answer from Microsoft (add ExplicitCapture), the problem seems to be with the capture group. The ExplicitCapture option turns this capture group into a non-capture group.

You can do the same without an option, making the group clearly not exciting:

 const string pattern = ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"; 

which, testing with LINQPad, seems to produce search results.

Are there any capture groups as described in the docs for Regex.Split

If the brackets in parentheses are used in the Regex.Split expression, any captured text is included in the resulting array of strings. For example, splitting the string "plum-pear" into a hyphen, placed in a capture in parentheses, added a string element containing a hyphen returned array.

+2
source

MS Solution

(adding the ExplicitCapture regex option)

+2
source

Source: https://habr.com/ru/post/1391460/


All Articles