Java regex to remove duplicate substrings from string

Question

Java regex to remove duplicate substrings from string

I am trying to create a regex to "reduce" repeating repeated consecutive substrings from a string in Java. For example, for the following input:

The big black dog big black dog is a friendly friendly dog who lives nearby nearby.

I would like to get the following output:

The big black dog is a friendly dog who lives nearby.

This is the code that I still have:

String input = "The big black dog big black dog is a friendly friendly dog who lives nearby nearby.";

Pattern dupPattern = Pattern.compile("((\\b\\w+\\b\\s)+)\\1+", Pattern.CASE_INSENSITIVE);
Matcher matcher = dupPattern.matcher(input);

while (matcher.find()) {
    input = input.replace(matcher.group(), matcher.group(1));
}

Which works great for all repeating substrings except the end of the sentence:

The big black dog is a friendly dog who lives nearby nearby.

, , . , , , , , , , ( "nearby.nearby." ).

- ? , .

+4

java string regex duplicates

ak_charlie 31 . '16 11:42

2

@Thomas Ayoub @Matt.

public class Test2 {
    public static void main(String args[]){
        String input = "The big big black dog big black dog is a friendly friendly dog who lives nearby nearby.";
        String result = input.replaceAll("\\b([ \\w]+)\\1", "$1");
        while(!input.equals(result)){
            input = result;
            result = input.replaceAll("\\b([ \\w]+)\\1", "$1");
        }
        System.out.println(result);
    }
}

+2

Gearon 31 . '16 12:12

Thomas Ayoub · Accepted Answer · 2016-07-31T11:52:23+0000

input.replaceAll("([ \\w]+)\\1", "$1");

:

import java.io.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Ideone
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String input = "The big black dog big black dog is a friendly friendly dog who lives nearby nearby.";

        Pattern dupPattern = Pattern.compile("([ \\w]+)\\1", Pattern.CASE_INSENSITIVE);
        Matcher matcher = dupPattern.matcher(input);

        while (matcher.find()) {
            input = input.replaceAll("([ \\w]+)\\1", "$1");
        }
        System.out.println(input);

    }
}

Java regex to remove duplicate substrings from string

More articles: