A positive look that (also) matches an empty string

Question

A positive look that (also) matches an empty string

I am interning with some Groovy code, and I came across the following pattern:

(?=(^\w)*)(\w)+(?=(^\w)*)

Basically, he simply finds words (adjacent collections of word characters) to strip off punctuation marks, etc. Is there a reason to not just use this template?

\w+

Since this is not my code, I think that maybe there was a reason to use something so ridiculously complicated, but at the same time it seems to be very inefficient. Is there a difference between the two? They seem to give the same results at http://regexpal.com/ .

+4

regex groovy regex-lookarounds

Alex hall Jun 11 '13 at 12:09

source share

1 answer

mr.spuratic · Accepted Answer · 2013-06-11T12:24:02+0000

The answer to the question, why not use only \w+ , is capture groups , this does not explain any possible subtlety or logic in the regular expression, though.

The prefix and suffix lines (optional) are partially fixed for possible future use, and, as noted by m.buettner ^\w , most likely means [^\w] , which means that the second final group never matches (although there may be cases with multi-line input, see Pattern Matching Flags , I do not see it myself, since \w+ will not match both consumption and end of line).

The use of and (?=) And * indicates that perhaps the author was not well acquainted with regexs, as a rule, the appearance of the workarounds is used to limit (which * effectively unzips here), or to optimize compliance.

A polite approach may suggest that during development, the regex was “changed” and left with some unnecessary subpatterns ...

A positive look that (also) matches an empty string

More articles: