Java regex replace all without replacing all words

I have been playing with this regular expression in Java for many years and can't get it to work:

(?:^| )(?:the|and|at|in|or|on|off|all|beside|under|over|next)(?: |$) 

Following:

 pattern.matcher("the cat in the hat").replaceAll(" ") 

gives me cat the hat . Another input example is the cat in of the next hat , which gives me the cat of next hat .

Is there any way to do this job of replacing regular expressions without having to split them into several separate regular expressions for each word and try to replace the string again?

+6
source share
2 answers

Yes, you can do it quite easily, you just need to use borders , what are you trying to describe: (?:^| ) Just do it instead

 \b(?:the|and|at|in|or|on|off|all|beside|under|over|next)\b 

Your original has not been captured, but as mentioned in the comments, if you want to capture the parameters, you can use the capture instead of the non-capture group:

 \b(the|and|at|in|or|on|off|all|beside|under|over|next)\b 
+10
source

The problem with yours is that leading and trailing spaces are included in matches, and char cannot be found in two matches.

So, with the_cat_in_the_hat input (underscores replace spaces here to make the explanation clear):

  • First match: the_ , remaining line: cat_in_the_hat
  • Second match: _in_ , remaining line: the_hat
  • the not matched because it is not preceded by a space or the beginning of the (original) line.

Instead, you could use lookarounds as they behave like conditions (i.e. if ):

 (?<=^| )(?:the|and|at|in|or|on|off|all|beside|under|over|next)(?= |$) 

Regular expression visualization

Demo version of Debuggex

So you will have:

  • First match: the , remaining line: _cat_in_the_hat
  • Second match: in , remaining line: _the_hat
  • Third match: the , remaining line: _hat

But @JonathanMee's answer is the best solution, as word boundaries have been precisely implemented for this purpose;)

+5
source

Source: https://habr.com/ru/post/985274/


All Articles