A quick fix for this template might be
(.+?)\1+
Your regex failed because it bound a repeating line to the beginning and end of the line, only allowing lines like abcabcabc , but not xabcabcabcx . In addition, the minimum length of a repeating line should be 1, not 0 (or any line will match), therefore .+? instead of .*? .
In Python:
>>> import re >>> r = re.compile(r"(.+?)\1+") >>> r.findall("cbabababac") ['ba'] >>> r.findall("dabcdbcdbcdd") ['bcd']
But keep in mind that this regular expression will only find non-overlapping duplicate matches, so in the last example, the solution d will not be found, although this is the shortest repeating string. Or look at this example: here it cannot find abcd , because the abc part of the first abcd was used in the first match):
>>> r.findall("abcabcdabcd") ['abc']
In addition, it can return multiple matches, so you need to find the shortest in the second step:
>>> r.findall("abcdabcdabcabc") ['abcd', 'abc']
The best decision:
To allow the engine to find matching matches, use
(.+?)(?=\1)
This will find several lines of two or more if they are repeated many times, but it will certainly find all possible duplicate substrings:
>>> r = re.compile(r"(.+?)(?=\1)") >>> r.findall("dabcdbcdbcdd") ['bcd', 'bcd', 'd']
Therefore, you should sort the results by length and return the shortest:
>>> min(r.findall("dabcdbcdbcdd") or [""], key=len) 'd'
or [""] (thanks to JF Sebastian!) ensures that no ValueError fires if there is no match at all.