This can be resolved in O (| S | + | L | + k) time, where k is the total number of matches of all lines from L to S. There are two main steps:
Launch Aho-Corasick . This will give you all matches of any string from L to S. This is done at the same time as above.
Initialize an array, A, of integers of length | S | + 1 to all zeros. March through the array, in position I set A [i] to A [i-1], if it is greater, then for each match M from L to S in position I set A [i + | M |] in max A [i + | M |] and A [i] + | M |.
Here is the code in Go that does just that. It uses the package I wrote, which has a convenient wrapper for calling Aho-Corasick.
package main import ( "fmt" "github.com/runningwild/stringz" ) func main() { input := []byte("thenuke") patterns := [][]byte{[]byte("hen"), []byte("thenu"), []byte("uke")} // This runs Aho-Corasick on the input string and patterns, it returns a // map, matches, such that matches[i] is a list of indices into input where // patterns[i] matches the input string. The running time of this is // O(|input| + |patterns| + k) and uses O(|patterns| + k) auxillary storage, // where k is the total number of matches found. find := stringz.FindSet(patterns) matches := find.In(input) // We want to invert the map so that it maps from index to all strings that // match at that index. at_pos := make([][]int, len(input)+1) for index, hits := range matches { for _, hit := range hits { at_pos[hit] = append(at_pos[hit], index) } } // Now we do a single pass through the string, at every position we will // keep track of how many characters in the input string we can match up to // that point. max := make([]int, len(input)+1) for i := range max { // If this position isn't as good as the previous position, then we'll use // the previous position. It just means that there is a character that we // were unable to match anything to. if i > 0 && max[i-1] > max[i] { max[i] = max[i-1] } // Look through any string that matches at this position, if its length is // L, then L positions later in the string we can have matched L more // character if we use this string now. This might mean that we don't use // another string that earlier we thought we'd be matching right now, we'll // find out later which one was better. for _, hit := range at_pos[i] { L := len(patterns[hit]) if i+L < len(max) && max[i+L] < max[i]+L { max[i+L] = max[i] + L } } } fmt.Printf("%v\n", max) }
source share