String.IndexOf performance

This simple piece of C # code, which is designed to look for script blocks in HTML, takes 0.5 seconds to run on a 74K char line with only 9 script blocks in it. This is an undebuged binary release on a 2.8Ghz i7 CPU. I did a few runs, although this code is to make sure that JIT does not interfere with performance. This is not true.

This is the VS2010.NET 4.0 client profile. 64

Why is it so slow?

int[] _exclStart = new int[100]; int[] _exclStop = new int[100]; int _excl = 0; for (int f = input.IndexOf("<script", 0); f != -1; ) { _exclStart[_excl] = f; f = input.IndexOf("</script", f + 8); if (f == -1) { _exclStop[_excl] = input.Length; break; } _exclStop[_excl] = f; f = input.IndexOf("<script", f + 8); ++_excl; } 
+6
source share
5 answers

I used the source on this page as an example, then I duplicated the content 8 times, resulting in a page of 334,312 bytes. Using StringComparision.Ordinal gives a huge difference in performance.

 string newInput = string.Format("{0}{0}{0}{0}{0}{0}{0}{0}", input.Trim().ToLower()); //string newInput = input.Trim().ToLower(); System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch(); sw.Start(); int[] _exclStart = new int[100]; int[] _exclStop = new int[100]; int _excl = 0; for (int f = newInput.IndexOf("<script", 0, StringComparison.Ordinal); f != -1; ) { _exclStart[_excl] = f; f = newInput.IndexOf("</script", f + 8, StringComparison.Ordinal); if (f == -1) { _exclStop[_excl] = newInput.Length; break; } _exclStop[_excl] = f; f = newInput.IndexOf("<script", f + 8, StringComparison.Ordinal); ++_excl; } sw.Stop(); Console.WriteLine(sw.Elapsed.TotalMilliseconds); 

works 5 times, gives almost the same result for each (cycle timings have not changed significantly, so for this simple code there is almost no time spent compiling JIT)

Output using source code (in Milliseconds ):

 10.2786 11.4671 11.1066 10.6537 10.0723 

Output using the above code (in Milliseconds ):

 0.3055 0.2953 0.2972 0.3112 0.3347 

Please note that my test results are approximately 0.010 seconds (source code) and 0.0003 seconds (for ordinal code). This means that you have something else wrong besides this code.

If, as you say, using StringComparison.Ordinal does nothing for your performance, it means that you either use the wrong timers in the time of your performance, or have big overhead when reading the input value, for example, reading it from the stream that you otherwise you don’t understand.

Tested under Windows 7 x64, running on 3GHz i5, using the .NET Client Client.

Suggestions:

  • use StringComparison.Ordinal
  • Make sure you use System.Diagnostics.Stopwatch to ensure time performance
  • Declare a local variable for input instead of using values ​​external to the function (for example: string newInput = input.Trim().ToLower(); )

Again, I emphasize that I am getting speed 50 times faster for test data, which is apparently 4 times larger in size, using the same code that you provide. This means that my test runs 200 times faster than yours, and this is not what someone would expect if we both worked in the same environment and only i5 (me) versus i7 (you).

+16
source

I would recommend using RegEx for this, it offers a significant performance boost since expressions are compiled only once. While IndexOf is essentially a loop that works on the basis of each character, which probably means that you have 3 loops in your main loop, of course IndexOf will not be as slow as a regular loop, but Yet when the input size increases, the time increases. Regex has built-in functions that return the number and position of occurrences of each template that you define.

Edit: this may shed light on IndexOf IndexOf Perf performance

+3
source

The IndexOf overload that you use is culture sensitive, which will affect performance. Use instead:

 input.IndexOf("<script", 0, StringComparison.Ordinal); 
+3
source

I am just testing IndexOf performance with .NET 4.0 on Windows 7

 public void Test() { var input = "Hello world, I'm ekk. This is test string"; TestStringIndexOfPerformance(input, StringComparison.CurrentCulture); TestStringIndexOfPerformance(input, StringComparison.InvariantCulture); TestStringIndexOfPerformance(input, StringComparison.Ordinal); Console.ReadLine(); } private static void TestStringIndexOfPerformance(string input, StringComparison stringComparison) { var count = 0; var startTime = DateTime.UtcNow; TimeSpan result; for (var index = 0; index != 1000000; index++) { count = input.IndexOf("<script", 0, stringComparison); } result = DateTime.UtcNow.Subtract(startTime); Console.WriteLine("{0}: {1}", stringComparison, count); Console.WriteLine("Total time: {0}", result.TotalMilliseconds); Console.WriteLine("--------------------------------"); } 

And the result:

 CurrentCulture: Total time: 225.4008 InvariantCulture: Total time: 187.2003 Ordinal: Total time: 124.8003 

As you can see, Ordinal's performance is slightly better.

+2
source

I am not discussing here the code that should probably be written using Regex, etc .... but for me it is slow because IndexOf () * inside * the for always scan the line from the beginning (it always starts with the index 0) will try to scan from the last occlusion found.

+1
source

Source: https://habr.com/ru/post/900324/


All Articles