Regular expression in java to search for duplicate consecutive words

Question

Regular expression in java to search for duplicate consecutive words

I saw this as an answer for finding duplicate words in a string. But when I use it, he thinks This and is match and delete is .

Regex

 "\\b(\\w+)\\b\\s+\\1"

Any idea why this is happening?

Here is the code I use to delete again

 public static String RemoveDuplicateWords(String input) { String originalText = input; String output = ""; Pattern p = Pattern.compile("\b(\w+)\b\s+\b\1\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); //Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\1", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(input); if (!m.find()) output = "No duplicates found, no changes made to data"; else { while (m.find()) { if (output == "") output = input.replaceFirst(m.group(), m.group(1)); else output = output.replaceAll(m.group(), m.group(1)); } input = output; m = p.matcher(input); while (m.find()) { output = ""; if (output == "") output = input.replaceAll(m.group(), m.group(1)); else output = output.replaceAll(m.group(), m.group(1)); } } return output; }

+6

java regex

user1190265 Feb 05 '12 at 6:01

source share

5 answers

Fahim parkar · Answer 1 · 2012-02-05T06:24:44+0000

you should have used \b(\w+)\b\s+\b\1\b , click here to see the result ...

Hope this is what you want ...

Update 1

Okay, well, the result you have,

final row after duplicate removal

 import java.util.regex.*; public class MyDup { public static void main (String args[]) { String input="This This is text text another another"; String originalText = input; String output = ""; Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\b\\1\\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(input); System.out.println(m); if (!m.find()) output = "No duplicates found, no changes made to data"; else { while (m.find()) { if (output == "") { output = input.replaceFirst(m.group(), m.group(1)); } else { output = output.replaceAll(m.group(), m.group(1)); } } input = output; m = p.matcher(input); while (m.find()) { output = ""; if (output == "") { output = input.replaceAll(m.group(), m.group(1)); } else { output = output.replaceAll(m.group(), m.group(1)); } } } System.out.println("After removing duplicate the final string is " + output); }

Run this code and see what you get as output ... Your requests will be resolved ...

Note

In output you replace the duplicate with one word ... Isn't that?

When I put System.out.println(m.group() + " : " + m.group(1)); first of all, if I get the condition as text text : text , that is, duplicates are replaced with one word.

 else { while (m.find()) { if (output == "") { System.out.println(m.group() + " : " + m.group(1)); output = input.replaceFirst(m.group(), m.group(1)); } else {

Hope you now have what is happening ... :)

Good luck !!! Hurrah!!!

Mina samy · Answer 2 · 2016-05-10T16:12:09+0000

Try the following:

 String pattern = "(?i)\\b([az]+)\\b(?:\\s+\\1\\b)+"; Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE); String input = "your string"; Matcher m = r.matcher(input); while(m.find( )){ input = input.replaceAll(m.group(0),m.group(1)); } System.out.println(input);

user5393067 · Answer 3 · 2015-12-09T11:29:31+0000

The following pattern will match duplicate words even for any number of occurrences.

 Pattern.compile("\\b(\\w+)(\\b\\W+\\b\\1\\b)*", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);

For eg: "This is my palm buddy," displays "This is my buddy"

In addition, only one iteration with "while (m.find ())" is enough with this template.

imbond · Answer 4 · 2016-08-30T07:32:49+0000

 \b(\w+)(\b\W+\1\b)*

Explanation:

 \b : Any word boundary <br/>(\w+) : Select any word character (letter, number, underscore)

After selecting all the words, now select the time to select common words.

 ( : Grouping starts<br/> \b : Any word boundary<br/> \W+ : Any non-word character<br/> \1 : Select repeated words<br/> \b : Un select if it repeated word is joined with another word<br/> ) : Grouping ends

Link: Example

anubhava · Answer 5 · 2012-02-05T10:32:49+0000

I believe this is a regular expression that you should use to detect two consecutive words separated by any number of characters other than words:

 Pattern p = Pattern.compile("\\b(\\w+)\\b\\W+\\b\\1\\b", Pattern.CASE_INSENSITIVE);

Regular expression in java to search for duplicate consecutive words

Update 1

final row after duplicate removal

Note

Hope you now have what is happening ... :)

Good luck !!! Hurrah!!!

More articles: