Performance std :: strstr vs. std :: string :: find

Question

Performance std :: strstr vs. std :: string :: find

Possible duplicate:
C ++ string :: find complexity

I recently found out that the std::string::find function is an order of magnitude slower than the std::strstr - in my environment with GCC 4.7 on Linux. The difference in performance depends on the length of the lines and the hardware architecture.

However, there is a simple reason for the difference: std::string::find basically calls std::memcmp in a loop - with O(m * n) time complexity. In contrast, std::strstr highly optimized for hardware architecture (e.g. with SSE instructions) and uses a more sophisticated string matching algorithm (apparently Knuth-Morris-Pratt).

I was also surprised if I did not find the time complexity of these two functions in language documents (i.e., draft N3290 and N1570). I just found temporary difficulties for char_traits . But this does not help, because there is no function to search for a substring in char_traits .

I would expect std::strstr and memmem contain similar optimizations with almost the same performance. And until recently, I assumed that std::string::find uses memmem internally.

Questions: Is there a good reason why std::string::find does not use std::memmem ? And is this different from another implementation?

The question is not: what is the best implementation of this function? In C ++, it's really hard to argue if it's slower than C. I wouldn’t care if both implementations are slow. This is a performance difference that really hurts.

+6

c ++ gcc stdstring strstr

nosid Apr 11 '12 at 6:33

source share

1 answer

James kanze · Accepted Answer · 2012-04-11T08:09:14+0000

First, what is memmem ? I cannot find this in the C ++ standard, Posix (which contains all the standard C functions).

Secondly, any measurement values will depend on the actual data. Using ILC, for example, will pessimize in many cases; probably the majority of cases when std::string member functions are used; the time to create the necessary tables will often be longer than the total time of the direct algorithm. Things like O(m*n) don't matter much when the typical string length is short.

Performance std :: strstr vs. std :: string :: find

More articles: