FindSubstrings and breakSubstring in Data.ByteString

Question

FindSubstrings and breakSubstring in Data.ByteString

The source Data/ByteString.hs says the findSubstrings function findSubstrings deprecated in favor of breakSubstring . However, I think that findSubstrings , which was implemented using the KMP algorithm, is much more efficient than the algorithm used in breakSubstring , which is naive. Does anyone know why this was done?

Here's the old implementation:

 {-# DEPRECATED findSubstrings "findSubstrings is deprecated in favour of breakSubstring." #-} {- {- This function uses the Knuth-Morris-Pratt string matching algorithm. -} findSubstrings pat@ (PS _ _ m) str@ (PS _ _ n) = search 0 0 where patc x = pat `unsafeIndex` x strc x = str `unsafeIndex` x -- maybe we should make kmpNext a UArray before using it in search? kmpNext = listArray (0,m) (-1:kmpNextL pat (-1)) kmpNextL p _ | null p = [] kmpNextL pj = let j' = next (unsafeHead p) j + 1 ps = unsafeTail p x = if not (null ps) && unsafeHead ps == patc j' then kmpNext Array.! j' else j' in x:kmpNextL ps j' search ij = match ++ rest -- i: position in string, j: position in pattern where match = if j == m then [(i - j)] else [] rest = if i == n then [] else search (i+1) (next (strc i) j + 1) next cj | j >= 0 && (j == m || c /= patc j) = next c (kmpNext Array.! j) | otherwise = j -}

And here is a new naive:

 findSubstrings :: ByteString -- ^ String to search for. -> ByteString -- ^ String to seach in. -> [Int] findSubstrings pat str | null pat = [0 .. length str] | otherwise = search 0 str where STRICT2(search) search ns | null s = [] | pat `isPrefixOf` s = n : search (n+1) (unsafeTail s) | otherwise = search (n+1) (unsafeTail s)

+6

performance string-matching haskell bytestring

sob7y Jun 13 '11 at 13:43

source share

1 answer

Don stewart · Answer 1 · 2011-06-13T16:37:09+0000

The reason for the change was that the implementation of KMP was actually more inefficient than the naive version, which was implemented taking into account performance.

FindSubstrings and breakSubstring in Data.ByteString

More articles: