A positive look that (also) matches an empty string

I am interning with some Groovy code, and I came across the following pattern:

(?=(^\w)*)(\w)+(?=(^\w)*)

Basically, he simply finds words (adjacent collections of word characters) to strip off punctuation marks, etc. Is there a reason to not just use this template?

\w+

Since this is not my code, I think that maybe there was a reason to use something so ridiculously complicated, but at the same time it seems to be very inefficient. Is there a difference between the two? They seem to give the same results at http://regexpal.com/ .

+4
source share
1 answer

The answer to the question, why not use only \w+ , is capture groups , this does not explain any possible subtlety or logic in the regular expression, though.

The prefix and suffix lines (optional) are partially fixed for possible future use, and, as noted by m.buettner ^\w , most likely means [^\w] , which means that the second final group never matches (although there may be cases with multi-line input, see Pattern Matching Flags , I do not see it myself, since \w+ will not match both consumption and end of line).

The use of and (?=) And * indicates that perhaps the author was not well acquainted with regexs, as a rule, the appearance of the workarounds is used to limit (which * effectively unzips here), or to optimize compliance.

A polite approach may suggest that during development, the regex was β€œchanged” and left with some unnecessary subpatterns ...

0
source

Source: https://habr.com/ru/post/1485609/


All Articles