Have you ever used KMP or BM algorithms?

Question

Have you ever used KMP or BM algorithms?

I know that the KMP (Knuth-Morris-Pratt) and BM (Boyers-Moore) algorithms are all good string search algorithms. I also know that BM is 3-5 times faster than KMP.

Have you ever used BM or KMP algorithms in your programming experience for industrial software? Does the algorithm really matter here?

+4

algorithm

user699656 Apr 9 '11 at 5:24

source share

3 answers

Function

glibc strstr is linear. It uses a two-way algorithm , which I think is a variant of Boyer-Moore. So, I assume that anyone using strstr in gcc actually uses a fast real-time string search algorithm.

Regarding the question of whether there is a fast algorithm, IMHO is only relevant if the data size is large enough. The many explicit string operations that we do are very small lines (say, less than 500 characters). This does not mean that we do not perform heavy string operations (for example, full-text database searches), but in this case we usually allow the database or library to do the hard work for us. The database or library uses fast string search algorithms, so I would not say that they do not matter, only that its use is not directly displayed to us.

+3

MAK Apr 9 '11 at 9:04

source share

I implemented KMP once on hardware. If the hardware is FPGA, you can use reconfigurability and self-modifying circuits. These patterns get a search string. Then do the necessary hardware pre-provisioning and reconfigure yourself with the logic that KMP does. But here it is also necessary that you have to scan a large amount of data in order to get acceleration, but everything was there (for example, DNA matching).

+2

flolo Apr 9 '11 at 6:11

source share

jallmer · Accepted Answer · 2011-04-09T05:37:34+0000

If you look, for example, at the Java function String.indexOf, it seems that they use brute force to match strings. You may wonder why this is so.

The reason is that in these algorithms some preliminary processing of the requests is performed, and this can be expensive (especially for BM if you use both arrays). Therefore, the strings you are searching must be large before KMP and BM can beet brute force.

There is always trade when using different algorithms, and when working with large strings, you can also consider indexing text instead of a query (for example, suffix trees). It can even be useful when you are dealing with new texts every time.

In my opinion, these algorithms are quite academic and useful only in special circumstances.

Have you ever used KMP or BM algorithms?

More articles: