Regular expression in java to search for duplicate consecutive words

I saw this as an answer for finding duplicate words in a string. But when I use it, he thinks This and is match and delete is .

Regex

 "\\b(\\w+)\\b\\s+\\1" 

Any idea why this is happening?

Here is the code I use to delete again

 public static String RemoveDuplicateWords(String input) { String originalText = input; String output = ""; Pattern p = Pattern.compile("\b(\w+)\b\s+\b\1\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); //Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\1", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(input); if (!m.find()) output = "No duplicates found, no changes made to data"; else { while (m.find()) { if (output == "") output = input.replaceFirst(m.group(), m.group(1)); else output = output.replaceAll(m.group(), m.group(1)); } input = output; m = p.matcher(input); while (m.find()) { output = ""; if (output == "") output = input.replaceAll(m.group(), m.group(1)); else output = output.replaceAll(m.group(), m.group(1)); } } return output; } 
+6
source share
5 answers

you should have used \b(\w+)\b\s+\b\1\b , click here to see the result ...

Hope this is what you want ...

Update 1

Okay, well, the result you have,

final row after duplicate removal

 import java.util.regex.*; public class MyDup { public static void main (String args[]) { String input="This This is text text another another"; String originalText = input; String output = ""; Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\b\\1\\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(input); System.out.println(m); if (!m.find()) output = "No duplicates found, no changes made to data"; else { while (m.find()) { if (output == "") { output = input.replaceFirst(m.group(), m.group(1)); } else { output = output.replaceAll(m.group(), m.group(1)); } } input = output; m = p.matcher(input); while (m.find()) { output = ""; if (output == "") { output = input.replaceAll(m.group(), m.group(1)); } else { output = output.replaceAll(m.group(), m.group(1)); } } } System.out.println("After removing duplicate the final string is " + output); } 

Run this code and see what you get as output ... Your requests will be resolved ...

Note

In output you replace the duplicate with one word ... Isn't that?

When I put System.out.println(m.group() + " : " + m.group(1)); first of all, if I get the condition as text text : text , that is, duplicates are replaced with one word.

 else { while (m.find()) { if (output == "") { System.out.println(m.group() + " : " + m.group(1)); output = input.replaceFirst(m.group(), m.group(1)); } else { 

Hope you now have what is happening ... :)

Good luck !!! Hurrah!!!

+6
source

Try the following:

 String pattern = "(?i)\\b([az]+)\\b(?:\\s+\\1\\b)+"; Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE); String input = "your string"; Matcher m = r.matcher(input); while(m.find( )){ input = input.replaceAll(m.group(0),m.group(1)); } System.out.println(input); 
+6
source

The following pattern will match duplicate words even for any number of occurrences.

 Pattern.compile("\\b(\\w+)(\\b\\W+\\b\\1\\b)*", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); 

For eg: "This is my palm buddy," displays "This is my buddy"

In addition, only one iteration with "while (m.find ())" is enough with this template.

+5
source
 \b(\w+)(\b\W+\1\b)* 

Explanation:

 \b : Any word boundary <br/>(\w+) : Select any word character (letter, number, underscore) 

After selecting all the words, now select the time to select common words.

 ( : Grouping starts<br/> \b : Any word boundary<br/> \W+ : Any non-word character<br/> \1 : Select repeated words<br/> \b : Un select if it repeated word is joined with another word<br/> ) : Grouping ends 

Link: Example

+4
source

I believe this is a regular expression that you should use to detect two consecutive words separated by any number of characters other than words:

 Pattern p = Pattern.compile("\\b(\\w+)\\b\\W+\\b\\1\\b", Pattern.CASE_INSENSITIVE); 
0
source

Source: https://habr.com/ru/post/907682/


All Articles