What is the most efficient string search method (read time)? (FROM#)

I found that my program is looking for a lot of long lines (20,000+), trying to find a specific unique phrase.

What is the most efficient method for doing this in C #?

Below is the current code, which works as follows:

  • The search begins with startPos, because the target area is somewhat removed from the very beginning.
  • It goes through the line, at each step it checks whether the substring starts from this point using startMatchString, which is an indicator that the beginning of the target line has been found. (The length of the target line varies).
  • Here it creates a new substring (interrupting 11 characters that mark the beginning of the target string) and searches for endMatchString

I already know that this is a terribly complex and possibly very inefficient algorithm. What is the best way to achieve the same result?

string result = string.Empty;    
for (int i = startPos; i <= response.Length - 1; i++)
{
   if (response.Substring(i).StartsWith(startMatchString))
   {
       string result = response.Substring(i).Substring(11);
       for (int j = 0; j <= result.Length - 1; j++)
       {
           if (result.Substring(j).StartsWith(endMatchString))
           {
               return result.Remove(j)
           }
       }
   }
}
return result;
+3
source share
8 answers

You can use String.IndexOf, but be sure to use StringComparison.Ordinal or it can be one order slower.

private string Search2(int startPos, string startMatchString, string endMatchString, string response) {
    int startMarch = response.IndexOf(startMatchString, startPos, StringComparison.Ordinal);
    if (startMarch != -1) {
        startMarch += startMatchString.Length;
        int endMatch = response.IndexOf(endMatchString, startMarch, StringComparison.Ordinal);
        if (endMatch != -1) { return response.Substring(startMarch, endMatch - startMarch); }
    }
    return string.Empty;
}

A 1000-fold search for a string of about 40% of an 183 KB file took about 270 milliseconds. Without StringComparison.Ordinal, it took about 2000 milliseconds.
Finding 1 time with your method took more than 60 seconds, as it creates a new line (O (n)) at each iteration, making your method O (n ^ 2).

+7
source

There is a whole group of algorithms,

  • boy and sea
  • Sunday
  • --
  • -

Boyer-Moore Boyer-Moore-Horspool.

C- . java

http://www.fmi.uni-sofia.bg/fmi/logic/vboutchkova/sources/BoyerMoore_java.html

http://www.ibm.com/developerworks/java/library/j-text-searching.html

, .

+7

, . IndexOf/Contains, , , Regex .

+6

, . , .

+4

; .

IndexOf...

string result = string.Empty;

if (startPos >= response.Length) 
    return result;

int startingIndex = response.IndexOf(startMatchString, startPos);
int rightOfStartIndex = startingIndex + startMatchString.Length;

if (startingIndex > -1 && rightOfStartIndex < response.Length)
{
    int endingIndex = response.IndexOf(endMatchString, rightOfStartIndex);
    if (endingIndex > -1)
        result = response.Substring(rightOfStartIndex, endingIndex - rightOfStartIndex);
}

return result;
0

-. , , CodeProject .

0

, regex - . , RegularExpressions.Group. .

0

IndexOf ( beware: , ):

int skip = 11;
int start = response.IndexOf(startMatchString, startPos);
if (start >= 0)
{
   int end = response.IndexOf(startMatchString, start + skip);
   if (end >= 0)
       return response.Substring(start + skip, end - start - skip);
   else
       return response.Substring(start + skip);
}
return string.Empty;
0

Source: https://habr.com/ru/post/1704020/


All Articles