According to man pcrepattern :
If the template starts with .* Or .{0,} , and the PCRE_DOTALL parameter (Perl /s equivalent) is set, which allows the dots to match new lines, the template is implicitly fixed , because any subsequent one will be checked for every character position in the subject line, therefore it makes no sense to repeat the general match in any position after the first.
As mentioned in the manpage, this optimization cannot be used if .* Is inside the group in brackets, which is used as a backlink, since in this case there may be a point in the re-execution of a common post match. The same argument would mean that this optimization is not true for zero-length calls, as the pattern indicates in OP.
It is not visible from the manpage whether the .* In the lookahead has an implicit anchor, but it is certainly possible (although it will be a mistake, imho). For some reason, adding (?-s) , which I think would PCRE_DOTALL , would not change the behavior. However, the change .* To something else. In particular, changing this parameter to [^\d]* causes the regular expression to have the expected result:
$ echo '!abcae20' | grep -P -o '(?=[^\d]*\d)\w{4,}' abcae20
It is at least interesting that there are cases where the lookahead statement works, apparently, without creating an implicit anchor, which may raise some doubts about the above analysis. But it may just be an interaction with some other optimization. In particular,
$ echo '!abcae20' | grep -P -o '(?=.*\d)a' a $
obviously could not work if the template was bound. On the other hand, changing a to [ab] , which, apparently, will not affect the match:
$ echo '!abcae20' | grep -P -o '(?=.*\d)[ab]' $
(Many thanks to @perreal for an interesting discussion of this issue.)
Some of the observations that initially make me think this might be a mistake were:
$ echo '!abcde20' | grep -P -o '(?=.*\d)\w*' abcde20 $ echo '!abcde20' | grep -P -o '(?=.*\d)\w+' $ echo '!abcde20' | grep -P -o '(?=.*\d)\w' $ echo '!abcde20' | grep -P -o '(?=.*\d)\w?' a b c d e 2 0
Everything looks illogical, but it actually makes sense if the template is implicitly fixed. In the first and last case ( \w* and \w ), the template will correspond to an empty line at the beginning of input. grep -o then repeat the pattern at the next character position where it succeeds. In the other two cases ( \w+ and \w ), the bound pattern will fail, so grep will not repeat it.
However, I adhere to my claim that implicit pinning (if that's what happens) is a mistake, since the manpage is very clear that this optimization and optimization should not change behavior. (In addition, this contradicts the match (?=.*\d)a .) But it is possible that the error is indicated in the documentation, because - according to @perreal - Perl also rejects these matches, and pcre should be Perl-compatible.