Ok, I found it! I found the best algorithm!
This may seem a little bold, but as long as the algorithm I'm going to suggest has both O(m + n)
runtime and O(m + n)
memory consumption, and the input data itself has the same properties, that the algorithm can only be optimized into a constant.
Algorithms Used
I am going to use a combination between KMP and Rabin Karp algorithms for my solution. Rabin Karp uses hashing to compare substrings of source strings. This requires a linear precalculation of time that uses linear additional memory, but from now on, the comparison between the substrings of two strings is O(1)
constant (this is amortized if you handle the conflicts correctly).
What my decision will not do
My solution will not find all occurrences in the first line that matches the second line with no more than 1 difference. However, the algorithm can be changed so that for each initial index in the first row, if there is such a match, at least one of them will be found (this will remain for the reader).
Observations
Let m
be the length of the second line and n
length of the first line. I am going to divide the task into two parts: if I am going to find a match with the maximum difference, I want to find the substrings of the first line: PREF
will be the substring before the only difference and the SUFF
substring after the difference. I want len(PREF) + len(SUFF) + 1 = m
, where PREF
or SUFF
will be artificially reduced if necessary (when the lines match no difference).
I am going to base my decision on one very important observation: suppose there is a substring of the first line starting with index i
with length m
, which corresponds to the second line with no more than one difference. Then, if you take PREF
as long as possible, there will still be a solution for SUFF
. This is obvious: I just maximize the difference to the end.
Algorithm
And now the algorithm itself follows. Start with regular KMP. Each time the extension of the prefix fails, and traces of failures must be executed, first check if the next letter is skipped, the remaining suffix will correspond to the remaining second line. If so, then the match found with a difference of not more than one character. If not, we continue with the usual KMP doing a Rabin Karp check every time a link to the failure is followed.
Let me clarify further the example of checking Rabin Karp. Suppose we are at a certain stage of KMP, and we find that first.substring[i, i + k - 1]
matches the first k
letters of the second line. Suppose also that the letter first[i + k]
is different from second[k]
. Then you check whether first.substring[i + k + 1, i + m - 1]
second.substring[k + 1, m - 1]
exactly second.substring[k + 1, m - 1]
with the help of Rabin Karp. This is exactly the case when you maximally increased the index of the initial form of the prefix i
and now you can try to see if there is a match with no more than one difference.
Rabin Karp will only be used when the fail link follows, which moves the initial prefix index with at least one, which means that no more than O(n)
Rabin Karp calls are used, each with O(1)
complexity, for a common linear complexity.