Faster replacement for Regex

I have about 100 Regex calls in my class, each call covers different types of data in a text protocol, but I have many files and based on Regex analytics 88% of my code execution is completed.

Many of these types of code:

 { Match m_said = Regex.Match(line, @"(.*) said,", RegexOptions.IgnoreCase); if (m_said.Success) { string playername = ma.Groups[1].Value; // some action return true; } } { Match ma = Regex.Match(line, @"(.*) is connected", RegexOptions.IgnoreCase); if (ma.Success) { string playername = ma.Groups[1].Value; // some action return true; } } { Match ma = Regex.Match(line, @"(.*): brings in for (.*)", RegexOptions.IgnoreCase); if (ma.Success) { string playername = ma.Groups[1].Value; long amount = Detect_Value(ma.Groups[2].Value, line); // some action return true; } } 

Is it possible to replace Regex with another faster solution?

+6
source share
5 answers

For regular expressions that are tested in a loop, it is often faster to recompile them outside the loop and just check them inside the loop.

You need to first declare different regular expressions with the appropriate patterns and only call Match() with the text to check in the second step.

+8
source

Besides precompiling your regular expression, you can get (possibly much more) performance benefits by writing a more accurate regular expression. In this regard .* Is almost always a bad choice:

(.*) is connected means: first match the entire string (part .* ) and then cancel one character at a time until it matches with is connected .

Now, if the line is not very short, or is connected displayed very close to the end of the line, this is a lot back, which costs time.

So, if you can clarify what an allowed match is, you can improve performance.

For example, if only alphanumeric characters are allowed, then (\w+) is connected will be good. If these are any characters without spaces, use (\S+) is connected . Etc., depending on the rules for valid compliance.

In your specific example, you don't seem to be doing anything with a captured match, so you can even add a regular expression at all and just look for a fixed substring. Which method will be the fastest at the end depends on your actual input and requirements.

+3
source

I don’t know if you can reuse the expressions or if the method is called several times, but if so, you must precompile your regular expressions. Try the following:

 private static readonly Regex xmlRegex = new Regex("YOUR EXPRESSION", RegexOptions.Compiled); 

In your example, every time this method is used, it "compiles" the expression, but this is not so important, since the expression is a constant. Now it is compiled compiled only once. The disadvantage is that when you first access the expression, it is a little slower.

+2
source

You can try compiling Regex in advance or consider combining all the individual Regex expressions into one (monster) Regex:

 Match m_said = Regex.Match(line, @"(.*) (said|(is connected)|...|...),", RegexOptions.IgnoreCase); 

You can then check the second capture group to determine what type of match has occurred.

+1
source

I know that Regex can do many things, but here is the benchmark with Regex vs char.Split vs string.split

http://www.dotnetperls.com/split in the Benchmarks section

+1
source

Source: https://habr.com/ru/post/906475/


All Articles