An algorithm for finding the minimum length of a substring that has all the characters of another string

Question

An algorithm for finding the minimum length of a substring that has all the characters of another string

I have two lines:
string1 - hello how are you ,
String2 - olo (including space)

Exit: lo ho (hel lo ho w is you)

lo ho is the only substring containing all characters of string2. Can someone suggest a good algorithm for this (I can only think of Brute Force algo - O (n ^ 2).

Also, the output should be a string of minimum length (in the case of several options).

+6

algorithm

instance May 30 '14 at 10:06

source share

4 answers

Niklas B. · Answer 1 · 2014-05-30T10:42:47+0000

Keep two pointers l and r and a hash table M = character -> count for characters in line2 that do not occur in s[l..r] .

First set l = 0 and r so that string1[l..r] contains all the characters of string2 (if possible). You do this by removing characters from M until they are empty.

Then continue by increasing r by one step at each step, and then increasing l as much as possible, while keeping M empty. The minimum over all r - l + 1 (the length of the substring s[l..r] ) is a solution.

Pythonic pseudo code:

 n = len(string1) M = {} # let say M is empty if it contains no positive values for c in string2: M[c]++ l = 0 r = -1 while r + 1 < n and M not empty: r++ M[string1[r]]-- if M not empty: return "no solution" answer_l, answer_r = l, r while True: while M[string1[l]] < 0: M[string1[l]]++ l++ if r - l + 1 < answer_r - anwer_l + 1: answer_l, answer_r = l, r r++ if r == n: break M[string1[r]]-- return s[answer_l..answer_r]

Empty checks can be implemented in O (1) if you maintain the number of positive records when performing increment and decrement operations.

Let n be the length of string1 and m be the length of string2 .

Note that l and r only increase, so no more than O (n) increments, and therefore no more than O (n) instructions, are executed in the last outer loop.

If m implemented as an array (I assume the alphabet is a constant size), you get O (n + m) runtime, which is optimal. If the alphabet is too large, you can use a hash table for the expected output (n + m).

Execution Example:

 string1 = "abbabcdbcb" string2 = "cbb" # after first loop M = { 'a': 0, 'b': 2, 'c': 1, 'd': 0 } # after second loop l = 0 r = 5 M = { 'a': -2, 'b': -1, 'c': 0, 'd': 0 } # increment l as much as possible: l = 2 r = 5 M = { 'a': -1, 'b': 0, 'c': 0, 'd': 0 } # increment r by one and then l as much as possible l = 2 r = 6 M = { 'a': -1, 'b': 0, 'c': 0, 'd': -1 } # increment r by one and then l as much as possible l = 4 r = 7 M = { 'a': 0, 'b': 0, 'c': 0, 'd': -1 } # increment r by one and then l as much as possible l = 4 r = 8 M = { 'a': 0, 'b': 0, 'c': -1, 'd': -1 } # increment r by one and then l as much as possible l = 7 r = 9 M = { 'a': 0, 'b': 0, 'c': 0, 'd': 0 }

The best solution is s [7..9].

Aprillion · Answer 2 · 2014-05-30T10:45:29+0000

I would calculate the character positions from string2 inside string1 , and then choose a permutation with a minimum distance between the smallest and highest character position:

 # positions are: # 01234567890123456 string1 = 'hello how are you' string2 = 'olo' # get string1 positions for each character from set(string2) positions = {'o': [4, 7, 15], 'l': [2, 3]} # get all permutations of positions (don't repeat the same element) # then pick the permutation with minimum distance between min and max position # (obviously, this part can be optimized, this is just an illustration) permutations = positions['o'] * positions['l'] * positions['o'] permutations = [[4,2,7], [4,3,7], [4,2,15], ...] the_permutation = [4,3,7] # voilà output = string1_without_spaces[3:7]

naota · Answer 3 · 2014-06-02T06:12:21+0000

This is an example implementation with JavaScript. The logic is similar to the @Aprillion logic described above.

DEMO: http://jsfiddle.net/ZB6vm/4/

 var s1 = "hello how are you"; var s2 = "olo"; var left, right; var min_distance; var answer = ""; // make permutation recursively function permutate(ar, arrs, k) { // check if the end of recursive call if (k == arrs.length) { var r = Math.max.apply(Math, ar); var l = Math.min.apply(Math, ar); var dist = r - l + 1; if (dist <= min_distance) { min_distance = dist; left = l; right = r; } return; } for (var i in arrs[k]) { var v = arrs[k][i]; if ($.inArray(v, ar) < 0) { var ar2 = ar.slice(); ar2.push(v); // recursive call permutate(ar2, arrs, k + 1); } } } function solve() { var ar = []; // 1-demension array to store character position var arrs = []; // 2-demension array to store character position for (var i = 0; i < s2.length; i++) { arrs[i] = []; var c = s2.charAt(i); for (var k = 0; k < s1.length; k++) { // loop by s1 if (s1.charAt(k) == c) { if ($.inArray(k, arrs[i]) < 0) { arrs[i].push(k); // save position found } } } } // call permutate permutate(ar, arrs, 0); answer = s1.substring(left, right + 1); alert(answer); } solve();

Hope this helps.

user1639485 · Answer 4 · 2014-11-26T06:08:10+0000

There is an algorithm that does this in O (N).

The idea: to have 2 arrays, namely: isRequired [256] and isFound [256], which reports the frequency of each character in S and, when analyzing the string S, the frequency of each character found. Also, keep a counter that tells you when a valid window is found. When a valid window is found, we can shift the window (to the right), supporting the specified invariant of the question.

C ++ program:

 void findMinWindow(const char *text, const char *pattern, int &start, int &end){ //Calcuate lengths of text and pattern int textLen = strlen(text); int patternLen = strlen(pattern); // Declare 2 arrays which keep tab of required & found frequency of each char in pattern int isRequired[256] ; //Assuming the character set is in ASCII int isFound[256]; int count = 0; //For ascertaining whether a valid window is found // Keep a tab of minimum window int minimumWindow = INT_MAX; //Prepare the isRequired[] array by parsing the pattern for(int i=0;i<patternLen;i++){ isRequired[pattern[i]]++; } //Let start parsing the text now // Have 2 pointers: i and j - both starting at 0 int i=0; int j=0; //Keep moving j forward, keep i fixed till we get a valid window for(c=j;c<textLen;c++){ //Check if the character read appears in pattern or not if(isRequired[text[c]] == 0){ //This character does not appear in the pattern; skip this continue; } //We have this character in the pattern, lets increment isFound for this char isFound[text[c]]++; //increment the count if this character satisfies the invariant if(isFound[text[c]] <= isRequired[text[c]]){ count++; } //Did we find a valid window yet? if(count == patternLen){ //A valid window is found..lets see if we can do better from here on //better means: increasing i to reduce window length while maintaining invariant while(isRequired[s[i]] == 0 || isFound[s[i]] > isRequired[s[i]]){ //Either of the above 2 conditions means we should increment i; however we // must decrease isFound for this char as well. //Hence do a check again if(isFound[s[i]] > isRequired[s[i]]){ isFound[s[i]]--; } i++; } // Note that after the while loop, the invariant is still maintained // Lets check if we did better int winLength = j-i+1; if(winLength < minimumWindow){ //update the references we got begin = i; end = j; //Update new minimum window lenght minimumWindow = winLength; } } //End of if(count == patternLen) } //End of for loop }

An algorithm for finding the minimum length of a substring that has all the characters of another string

More articles: