How to remove duplicate letters in java using (Regular expressions) and be case insensitive

Question

How to remove duplicate letters in java using (Regular expressions) and be case insensitive

I tried to do this in order to replace any duplicate letters with the lower case version of my letter (in java). For instance:

I need a function that displays:

bob -> bob bOb -> bob bOOb -> bob bOob -> bob boOb -> bob bob -> bob Bob -> Bob bOb -> bob

However, I was not able to do this with regular expressions (in Java).

I tried the following:

  String regex = "([A-za-z])\\1+"; String str ="bOob"; Pattern pattern = Pattern.compile(regex , Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(str); System.out.println(matcher.replaceAll("$1"));

However, this returns bOb, not bob. (it works on boOb).

I also tried:

  Pattern pattern = Pattern.compile("(?i)([A-Za-z0-9])(?=\\1)", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(str); return matcher.replaceAll("");

This solves one problem, now bOob -> bob, but brings another problem, because now it maps boOb to bob.

NOTE: it should also display BOobOoboObOoObooOoOoOoOoOOb → Bobobobobob.

I feel that at this point it is easiest to double-check the string and make some logic based on each character, but I just did not want to refuse to use regular expressions ... If there is a solution using regular expressions, it is more likely to be more efficient than a loop passing through each character?

Thanks in advance!

PS: I know that to pass a string you could just omit everything, but this is not what I wanted because it displays:

Bob → bob

+4

java regex case-insensitive

Charlie parker Jul 28 '13 at 4:08

source share

2 answers

I think this is the code I was looking for (based on the accepted answer):

 public String removeRepeatedLetters(String str, boolean caseSensitive){ if(caseSensitive){ return this.removeRepeatedLetters(str); //uses case sensitive version }else{ Pattern patternRep = Pattern.compile("([A-Za-z])(\\1+)", Pattern.CASE_INSENSITIVE); Matcher matcher = patternRep.matcher(str); String output = str; while(matcher.find()){ String matchStr = matcher.group(1); output = matcher.replaceFirst(matchStr.toLowerCase()); matcher = patternRep.matcher(output); matcher.reset(); } return output; } }

What he does is replace any duplicate letters (be it caps or no caps) and replace them with one non-trial one.

I think it is very close to working as I want, although it displays Bbob -> bob. I doubt that since this is not a comparison with Bob, it would affect the reason why I use it too much.

btw, if anyone can see how to optimize this, feel free to comment! This annoys me a bit .reset (), although I'm not sure if this is necessary.

+2

Charlie parker Jul 28 '13 at 7:45

source share

Ravi thapliyal · Accepted Answer · 2013-07-28T04:23:27+0000

Use Matches # group () instead of $1 here

 if (matcher.find()) { System.out.println(matcher.replaceAll(matcher.group(1) .toLowerCase())); }

Used by toLowerCase() .

EDIT : (in response to OP comments)

Matcher#group(n) matches $n - this refers to the nth capture group. So group(1) and $1 both O captures, except that you can switch toLowerCase() .

The loop is replaceAll() done with replaceAll() using find() . Matcher#find() needs to initialize groups, so group(1) returns the capture before calling replaceAll() .

But it also means that the capture remains the same that satisfies your requirements, but for this it must be reset for a string like BOobbOobboObbOoObbooOoOoOoOoOObb (note double b). Now the loop should be controlled by Mathcer#find() , which means that replaceAll() receives trades using replaceFirst() .

 String regex = "([A-Za-z])\\1+"; String str = "BOobbOobboObbOoObbooOoOoOoOoOObb"; Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(str); while (matcher.find()) { str = matcher.replaceFirst(matcher.start() > 0 ? matcher.group(1) .toLowerCase() : matcher.group(1)); matcher.reset(str); } System.out.println(str); // Bobobobobob

This uses Matcher # start () to determine if the match is at the beginning of the input, where the case remains untouched.

How to remove duplicate letters in java using (Regular expressions) and be case insensitive

More articles: