I am trying to create a regex to "reduce" repeating repeated consecutive substrings from a string in Java. For example, for the following input:
The big black dog big black dog is a friendly friendly dog who lives nearby nearby.
I would like to get the following output:
The big black dog is a friendly dog who lives nearby.
This is the code that I still have:
String input = "The big black dog big black dog is a friendly friendly dog who lives nearby nearby.";
Pattern dupPattern = Pattern.compile("((\\b\\w+\\b\\s)+)\\1+", Pattern.CASE_INSENSITIVE);
Matcher matcher = dupPattern.matcher(input);
while (matcher.find()) {
input = input.replace(matcher.group(), matcher.group(1));
}
Which works great for all repeating substrings except the end of the sentence:
The big black dog is a friendly dog who lives nearby nearby.
, , . , , , , , , , ( "nearby.nearby." ).
- ? , .