The source Data/ByteString.hs says the findSubstrings function findSubstrings deprecated in favor of breakSubstring . However, I think that findSubstrings , which was implemented using the KMP algorithm, is much more efficient than the algorithm used in breakSubstring , which is naive. Does anyone know why this was done?
Here's the old implementation:
{-# DEPRECATED findSubstrings "findSubstrings is deprecated in favour of breakSubstring." #-} {- {- This function uses the Knuth-Morris-Pratt string matching algorithm. -} findSubstrings pat@ (PS _ _ m) str@ (PS _ _ n) = search 0 0 where patc x = pat `unsafeIndex` x strc x = str `unsafeIndex` x -- maybe we should make kmpNext a UArray before using it in search? kmpNext = listArray (0,m) (-1:kmpNextL pat (-1)) kmpNextL p _ | null p = [] kmpNextL pj = let j' = next (unsafeHead p) j + 1 ps = unsafeTail p x = if not (null ps) && unsafeHead ps == patc j' then kmpNext Array.! j' else j' in x:kmpNextL ps j' search ij = match ++ rest -- i: position in string, j: position in pattern where match = if j == m then [(i - j)] else [] rest = if i == n then [] else search (i+1) (next (strc i) j + 1) next cj | j >= 0 && (j == m || c /= patc j) = next c (kmpNext Array.! j) | otherwise = j -}
And here is a new naive:
findSubstrings :: ByteString -- ^ String to search for. -> ByteString -- ^ String to seach in. -> [Int] findSubstrings pat str | null pat = [0 .. length str] | otherwise = search 0 str where STRICT2(search) search ns | null s = [] | pat `isPrefixOf` s = n : search (n+1) (unsafeTail s) | otherwise = search (n+1) (unsafeTail s)
sob7y source share