Why is the for loop on the regex slow?

I have the following code:

string pattern = @"(?:\S+\s){1,6}\S*" + search + @"\S*(?:\s\S+){1,6}"; String dbContents = row[2].ToString(); var matches = Regex.Matches(dbContents, pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled); for (int i = 0; i < matches.Count; i++) { if (i == 3) break; Contents += String.Format("... {0} ...", matches[i].Value); } 

What I'm trying to accomplish is to get one to six words before the search term and 1-6 words after the search query. When executing the code, the performance falling into the for loop is "match.Count". With very large strings, it takes more than a minute to perform. I am confused about why and what to do to solve the problem.

+4
source share
3 answers

To find a counter, he must find all matches in order to count them. Given that you still stop after three, this seems a little pointless.

Use the MatchCollection lazy score in combination with the LINQ Take method to complete only the first three matches. It is generally recommended to use StringBuilder instead of concatenating strings in a loop:

 StringBuilder builder = new StringBuilder(...); foreach (var match in matches.Cast<Match>().Take(3)) { builder.AppendFormat("... {0} ...", matches[i].Value); } 

(Maybe changing a StringBuilder here won't make much difference, but it's a good habit to come in. Cast is required because Enumerable.Take only works with the generic IEnumerable<T> .)

+10
source

From MSDN:

The Matches method uses a lazy bound to populate the returned MatchCollection. Access to members of this collection, such as MatchCollection.Count and MatchCollection.CopyTo causes the collection to be populated immediately. To take advantage of lazy pricing, you should iterate over the collection using a construct like foreach in C #

Bottom line: change your code to use foreach .

+3
source

Another way to do this is to call Match and then NextMatch , for example:

  var match = Regex.Match(dbContents, pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled); for (int i = 0; i < 3 && match.Success; i++) { Contents += String.Format("... {0} ...", matches[i].Value); match = match.NextMatch(); } 
+3
source

Source: https://habr.com/ru/post/1498799/


All Articles