How to remove duplicate letters in java using (Regular expressions) and be case insensitive

I tried to do this in order to replace any duplicate letters with the lower case version of my letter (in java). For instance:

I need a function that displays:

bob -> bob bOb -> bob bOOb -> bob bOob -> bob boOb -> bob bob -> bob Bob -> Bob bOb -> bob 

However, I was not able to do this with regular expressions (in Java).

I tried the following:

  String regex = "([A-za-z])\\1+"; String str ="bOob"; Pattern pattern = Pattern.compile(regex , Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(str); System.out.println(matcher.replaceAll("$1")); 

However, this returns bOb, not bob. (it works on boOb).

I also tried:

  Pattern pattern = Pattern.compile("(?i)([A-Za-z0-9])(?=\\1)", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(str); return matcher.replaceAll(""); 

This solves one problem, now bOob -> bob, but brings another problem, because now it maps boOb to bob.

NOTE: it should also display BOobOoboObOoObooOoOoOoOoOOb → Bobobobobob.

I feel that at this point it is easiest to double-check the string and make some logic based on each character, but I just did not want to refuse to use regular expressions ... If there is a solution using regular expressions, it is more likely to be more efficient than a loop passing through each character?

Thanks in advance!

PS: I know that to pass a string you could just omit everything, but this is not what I wanted because it displays:

Bob → bob

+4
source share
2 answers

Use Matches # group () instead of $1 here

 if (matcher.find()) { System.out.println(matcher.replaceAll(matcher.group(1) .toLowerCase())); } 

Used by toLowerCase() .

EDIT : (in response to OP comments)

Matcher#group(n) matches $n - this refers to the nth capture group. So group(1) and $1 both O captures, except that you can switch toLowerCase() .

The loop is replaceAll() done with replaceAll() using find() . Matcher#find() needs to initialize groups, so group(1) returns the capture before calling replaceAll() .

But it also means that the capture remains the same that satisfies your requirements, but for this it must be reset for a string like BOobbOobboObbOoObbooOoOoOoOoOObb (note double b). Now the loop should be controlled by Mathcer#find() , which means that replaceAll() receives trades using replaceFirst() .

 String regex = "([A-Za-z])\\1+"; String str = "BOobbOobboObbOoObbooOoOoOoOoOObb"; Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(str); while (matcher.find()) { str = matcher.replaceFirst(matcher.start() > 0 ? matcher.group(1) .toLowerCase() : matcher.group(1)); matcher.reset(str); } System.out.println(str); // Bobobobobob 

This uses Matcher # start () to determine if the match is at the beginning of the input, where the case remains untouched.

+3
source

I think this is the code I was looking for (based on the accepted answer):

 public String removeRepeatedLetters(String str, boolean caseSensitive){ if(caseSensitive){ return this.removeRepeatedLetters(str); //uses case sensitive version }else{ Pattern patternRep = Pattern.compile("([A-Za-z])(\\1+)", Pattern.CASE_INSENSITIVE); Matcher matcher = patternRep.matcher(str); String output = str; while(matcher.find()){ String matchStr = matcher.group(1); output = matcher.replaceFirst(matchStr.toLowerCase()); matcher = patternRep.matcher(output); matcher.reset(); } return output; } } 

What he does is replace any duplicate letters (be it caps or no caps) and replace them with one non-trial one.

I think it is very close to working as I want, although it displays Bbob -> bob. I doubt that since this is not a comparison with Bob, it would affect the reason why I use it too much.

btw, if anyone can see how to optimize this, feel free to comment! This annoys me a bit .reset (), although I'm not sure if this is necessary.

+2
source

Source: https://habr.com/ru/post/1493879/


All Articles