Substring duplication

Is there an efficient way to find duplicate substring? Here, a duplicate means that two identical substrings, close to each other, have the same value without overlapping. For example, the source line:

ABCDDEFGHFGH

'D' and 'FGH' are duplicated. "F" appear twice in the sequence, however they are not close to each other, therefore they are not duplicated. therefore our algorithm will return ['D', 'FGH']. I want to know if there is an elegant algorithm instead of brute force?

+4
source share
3 answers

, Θ(n)

+3

( / ), (#):

  string source = @"ABCDDEFGHFGH";

  string[] result = Regex
    .Matches(source, @"(.+)\1")
    .OfType<Match>()
    .Select(match => match.Groups[1].Value)
    .ToArray(); 

(.+) - group of any (at least 1) characters
\1   - the same group (group #1) repeated 

  Console.Write(string.Join(", ", result));     

  D, FGH

, . "AAAA", "AA", "A", , , "AA".

+1

- , , , , . JS.

function getNborDupes(s){
  var cl = 0,  // cursor left
      cr = 0,  // cursor right
      ts = "", // test string
     res = []; // result array
  while (cl < s.length){
    cr = cl;
    while (++cr < s.length){
      ts = s.slice(cl,cr);  // ts starting from cl to cr (char @ cr excluded)
      
      // check ts with subst from cr to cr + ts.length (char @ cr + ts.length excluded)
      // if they match push it to result advance cursors to cl + ts.length and continue
      
      ts === s.substr(cr,ts.length) && (res.push(ts), cl = cr += ts.length);
    }
  cl++;
  }
  return res;
}

var str = "ABCDDEFGHFGH";
console.log(getNborDupes(str));

ts .

A
AB
ABC
ABCD
ABCDD
ABCDDE
ABCDDEF
ABCDDEFG
ABCDDEFGH
ABCDDEFGHF
ABCDDEFGHFG
B
BC
BCD
BCDD
BCDDE
BCDDEF
BCDDEFG
BCDDEFGH
BCDDEFGHF
BCDDEFGHFG
C
CD
CDD
CDDE
CDDEF
CDDEFG
CDDEFGH
CDDEFGHF
CDDEFGHFG
D
E
EF
EFG
EFGH
EFGHF
EFGHFG
F
FG
FGH

Although the part cl = cr += ts.lengthdecides whether to restart the search before or after the corresponding substring. At the moment, the above code; "ABABABAB"will return ["AB","AB"], but if you do this cr = cl += ts.length, then you should expect that the result will be ["AB", "AB", "AB"].

+1
source

Source: https://habr.com/ru/post/1664553/


All Articles