Find the smallest substring containing the given set of letters in the larger string

Say you have the following line:

FJKAUNOJDCUTCRHBYDLXKEODVBWTYPTSHASQQFCPRMLDXIJMYPVOHBDUGSMBLMVUMMZYHULSUIZIMZTICQORLNTOVKVAMQTKHVRIFMNTSLYGHEHFAHWWATLYAPEXTHEPKJUGDVWUDDPRQLUZMSZOJPSIKAIHLTONYXAULECXXKWFQOIKELWOHRVRUCXIAASKHMWTMAJEWGEESLWRTQKVHRRCDYXNT LDSUPXMQTQDFAQAPYBGXPOLOCLFQNGNKPKOBHZWHRXAWAWJKMTJSLDLNHMUGVVOPSAMRUJEYUOBPFNEHPZZCLPNZKWMTCXERPZRFKSXVEZTYCXFRHRGEITWHRRYPWSVAYBUHCERJXDCYAVICPTNBGIODLYLMEYLISEYNXNMCDPJJRCTLYNFMJZQNCLAGHUDVLYIGASGXSZYPZKLAWQUDVNTWGFFY FFSMQWUNUPZRJMTHACFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDADYSORTYZQPWGMBLNAQOFODSNXSZFURUNPMZGHTAJUJROIGMRKIZHSFUSKIZJJTLGOEEPBMIXISDHOAIFNFEKKSLEXSJLSGLCYYFEQBKIZZTQQ XBQZAPXAAIFQEIXELQEZGFEPCKFPGXULLAHXTSRXDEMKFKABUTAABSLNQBNMXNEPODPGAORYJXCHCGKECLJVRBPRLHORREEIZOBSHDSCETTTNFTSMQPQIJBLKNZDMXOTRBNMTKHHCZQQMSLOAXJQKRHDGZVGITHYGVDXRTVBJEAHYBYRYKJAVXPOKHFFMEPHAGFOOPFNKQAUGYLVPWUJUPCUGGIXGR AMELUTEPYILBIUOCKKUUBJROQFTXMZRLXBAMHSDTEKRRIKZUFNLGTQAEUINMBPYTWXULQNIIRXHHGQDPENXAJNWXULFBNKBRINUMTRBFWBYVNKNKDFR 

I am trying to find the smallest substring containing the letters ABCDA .

I tried the regex approach.

 console.log(str.match(/[A].*?[B].*?[C].*?[D].*?[A]/gm).sort((a, b) => a.length - b.length)[0]); 

This works, but it only finds the lines in which ABCDA appears (in that order). So he won’t find a substring where the letters appear in the following order: BCDAA

I am trying to change my regex for an account. How can I do this without using | and enter all the different cases?

+5
source share
3 answers

You can not.

Consider a special case: suppose the letters you are looking for are A , A and B At some point, your regex will definitely have B However, the parts to the left and right of B are independent of each other, so you cannot refer to each other. How much A matched in the subexpression to the right of B depends on the number of matches of A on the left. This is not possible with regular expressions, so you have to deploy all the different orders, which can be many!

Another popular example illustrating the problem is matching parentheses with parentheses. It is not possible to write a regular expression stating that in this line a sequence of opening brackets is followed by a sequence of closing brackets of the same length. The reason for this is that to count the brackets you need a stack machine as opposed to a finite state machine, but regular expressions are limited to patterns that can be matched using FSM.

+3
source

Perhaps it is not as clear as using regex can be (well, for me the regex is never clear: D) you can use brute force (not so rude)

Create an index of the "right" points of your string (those with letters), and repeat the double-loop cycle, getting substrings containing at least 5 of these points, checking that they are valid solutions. This may not be the most effective way, but it is easy to implement, understand, and possibly optimize.

 var haystack=""; var needle="ABCD"; var size=haystack.length; var candidate_substring=""; var minimal_length=size; var solutions=new Array(); var points=Array(); for(var i=0;i<size;i++){ if(needle.indexOf(haystack[i])>-1) points.push(i); } var limit_i= points.length-4; var limit_k= points.length; for (var i=0;i<limit_i;i++){ for(var k=i;k<limit_k;k++){ if(points[k]-points[i]+1<=minimal_length){ candidate_substring=haystack.substr(points[i],points[k]-points[i]+1); if(is_valid(candidate_substring)){ solutions.push(candidate_substring); if(candidate_substring.length < minimal_length) minimal_length=candidate_substring.length; } } } } document.write('<p>Solution length:'+minimal_length+'<p>'); for(var i=0;i<solutions.length;i++){ if(solutions[i].length<=minimal_length) document.write('<p>Solution:'+solutions[i]+'<p>'); } function is_valid(candidate_substring){ //verify we've got all characters for(var j=0;j<candidate_substring.length;j++){ if(candidate_substring.indexOf(needle.charAt(j))<0) return false; } //...and verify we have two "A" if(candidate_substring.indexOf("A")==candidate_substring.lastIndexOf("A")) return false; return true; } 
+1
source

This algorithm does not use regex, but also found both solutions.

 var haystack = ''; var needle = 'ABCDA'; // the order of letters doesn't matter var letters = {}; needle.split('').forEach(function(ch) { letters[ch] = letters[ch] || 0; letters[ch]++; }); var shortestSubstringLength = haystack.length; var shortestSubstrings = []; // storage for found substrings var startingPos = 0; var length; var currentPos; var notFound; var letterKeys = Object.keys(letters); // unique leters do { lettersLeft = JSON.parse(JSON.stringify(letters)); // copy letters count object notFound = false; posStart = haystack.length; posEnd = 0; letterKeys.forEach(function(ch) { currentPos = startingPos; while (!notFound && lettersLeft[ch] > 0) { currentPos = haystack.indexOf(ch, currentPos); if (currentPos >= 0) { lettersLeft[ch]--; posStart = Math.min(currentPos, posStart); posEnd = Math.max(currentPos, posEnd); currentPos++; } else { notFound = true; } } }); if (!notFound) { length = posEnd - posStart + 1; startingPos = posStart + 1; // starting position for next iteration } if (!notFound && length === shortestSubstringLength) { shortestSubstrings.push(haystack.substr(posStart, length)); } if (!notFound && length < shortestSubstringLength) { shortestSubstrings = [haystack.substr(posStart, length)]; shortestSubstringLength = length; } } while (!notFound); console.log(shortestSubstrings); 
+1
source

Source: https://habr.com/ru/post/1238762/


All Articles