What is the most efficient string search method (read time)? (FROM#)

Question

What is the most efficient string search method (read time)? (FROM#)

I found that my program is looking for a lot of long lines (20,000+), trying to find a specific unique phrase.

What is the most efficient method for doing this in C #?

Below is the current code, which works as follows:

The search begins with startPos, because the target area is somewhat removed from the very beginning.
It goes through the line, at each step it checks whether the substring starts from this point using startMatchString, which is an indicator that the beginning of the target line has been found. (The length of the target line varies).
Here it creates a new substring (interrupting 11 characters that mark the beginning of the target string) and searches for endMatchString

I already know that this is a terribly complex and possibly very inefficient algorithm. What is the best way to achieve the same result?

string result = string.Empty;    
for (int i = startPos; i <= response.Length - 1; i++)
{
   if (response.Substring(i).StartsWith(startMatchString))
   {
       string result = response.Substring(i).Substring(11);
       for (int j = 0; j <= result.Length - 1; j++)
       {
           if (result.Substring(j).StartsWith(endMatchString))
           {
               return result.Remove(j)
           }
       }
   }
}
return result;

+3

string c # algorithm search

Andrew Harry Mar 01 '09 at 11:34

source share

8 answers

There is a whole group of algorithms,

boy and sea
Sunday
--
-

Boyer-Moore Boyer-Moore-Horspool.

C- . java

http://www.fmi.uni-sofia.bg/fmi/logic/vboutchkova/sources/BoyerMoore_java.html

http://www.ibm.com/developerworks/java/library/j-text-searching.html

, .

+7

Luixv 01 . '09 12:28

, . IndexOf/Contains, , , Regex .

+6

Brian Rasmussen 01 . '09 11:43

, . , .

+4

driis 01 . '09 11:39

; .

IndexOf...

string result = string.Empty;

if (startPos >= response.Length) 
    return result;

int startingIndex = response.IndexOf(startMatchString, startPos);
int rightOfStartIndex = startingIndex + startMatchString.Length;

if (startingIndex > -1 && rightOfStartIndex < response.Length)
{
    int endingIndex = response.IndexOf(endMatchString, rightOfStartIndex);
    if (endingIndex > -1)
        result = response.Substring(rightOfStartIndex, endingIndex - rightOfStartIndex);
}

return result;

0

Nescio 01 . '09 11:40

-. , , CodeProject .

0

Khadaji 01 . '09 11:58

, regex - . , RegularExpressions.Group. .

0

Ruben 01 . '09 12:00

IndexOf ( beware: , ):

int skip = 11;
int start = response.IndexOf(startMatchString, startPos);
if (start >= 0)
{
   int end = response.IndexOf(startMatchString, start + skip);
   if (end >= 0)
       return response.Substring(start + skip, end - start - skip);
   else
       return response.Substring(start + skip);
}
return string.Empty;

0

oefe 01 . '09 14:10

ggf31416 · Accepted Answer · 2009-03-01T13:59:41+0000

You can use String.IndexOf, but be sure to use StringComparison.Ordinal or it can be one order slower.

private string Search2(int startPos, string startMatchString, string endMatchString, string response) {
    int startMarch = response.IndexOf(startMatchString, startPos, StringComparison.Ordinal);
    if (startMarch != -1) {
        startMarch += startMatchString.Length;
        int endMatch = response.IndexOf(endMatchString, startMarch, StringComparison.Ordinal);
        if (endMatch != -1) { return response.Substring(startMarch, endMatch - startMarch); }
    }
    return string.Empty;
}

A 1000-fold search for a string of about 40% of an 183 KB file took about 270 milliseconds. Without StringComparison.Ordinal, it took about 2000 milliseconds.
Finding 1 time with your method took more than 60 seconds, as it creates a new line (O (n)) at each iteration, making your method O (n ^ 2).

What is the most efficient string search method (read time)? (FROM#)

More articles: