The regex for stopping is first found with a subpattern

Let us consider the following two examples.

preg_match('/^(\pL+)(?:bcd|cd|d)$/u', 'abcd', $matches);
preg_match('/^(\pL+)(?:d|cd|bcd)$/u', 'abcd', $matches);

Both examples return 'abc' as $matches[1].

Why doesn't regex stop when it first detects a subpattern in an inappropriate group? Is it possible to stop at 'bcd' and get 'a' like $matches[1]?

+4
source share
3 answers

Yes, making the quantifier +inanimate:

preg_match('/^(\pL+?)(?:bcd|cd|d)$/u', 'abcd', $matches);
preg_match('/^(\pL+?)(?:d|cd|bcd)$/u', 'abcd', $matches);
+1
source

To complete the other answers, this is a schematic description of what is happening:

str | pattern | state | description
------ + ---------------------- + ----------- + -------- ---------------------------------
abcd  | ^(\pL+)(?:bcd|cd|d)$ | SUCCESS   | all letters are matched by \pL+ (greedy)
abcd  | ^(\pL+)(?:bcd|cd|d)$ | FAIL      | there is no more character 
abcd  | ^(\pL+)(?:bcd|cd|d)$ | FAIL      | idem
abcd  | ^(\pL+)(?:bcd|cd|d)$ | FAIL      | idem
abcd  | ^(\pL+)(?:bcd|cd|d)$ | BACKTRACK | \pL+ give one character back
abcd  | ^(\pL+)(?:bcd|cd|d)$ | FAIL      | characters mismatch
abcd  | ^(\pL+)(?:bcd|cd|d)$ | FAIL      | idem
abcd  | ^(\pL+)(?:bcd|cd|d)$ | SUCCESS   |
abcd  | ^(\pL+)(?:bcd|cd|d)$ | SUCCESS   |

, - , .

. . " " . , bcd, regex, , b .

2: bcd cd .

+3

You can use:

preg_match('/^(\pL+?)(?>bcd|cd|d)$/u', 'abcd', $matches);
print_r($matches);

OUTPUT:

Array
(
    [0] => abcd
    [1] => a
)
+2
source

Source: https://habr.com/ru/post/1531196/


All Articles