Java regex performance: reluctant quantifier or character class?

Which ones are more effective or (if equivalent) that are better readable? I am trying to match everything inside a pair of parentheses.

Pattern p1 = Pattern.compile("\\([^)]*\\)"); Pattern p2 = Pattern.compile("\\(.*?\\)"); 

For me, the second one is better read, but it uses a possible confusing quantifier of uncertainty, and I'm not sure if this leads to a loss of performance.

EDIT

Do not miss the answer, which shows that it is even better:

 Pattern p3 = Pattern.compile("\\([^)]*+\\)"); 
+4
source share
2 answers

It has better performance compared to p2 , not in a greedy way, which will lead to a fallback.

 Pattern p1 = Pattern.compile("\\([^)]*\\)"); 

Have a look in this article .

+3
source

\([^)]*\) will be faster, although not noticeable if the input is small. Better amplification is likely to happen if you make [^)]* possessive: [^)]*+ . Thus, the regex engine will not track all matches of the characters [^)]* in case it needs to retreat (which does not happen in the case of [^)]*\) ). Creating a pattern causes the regex engine to not remember the characters that this pattern matched.

Again, this may not be noticeable, but if your input gets large (r), I'm sure * the difference is between .*? and [^)]* less than between [^)]* and [^)]*+ .

* perform certain tests!

+4
source

Source: https://habr.com/ru/post/1437961/


All Articles