Regular expression. *? vs. *

I came across a PHP regular expression article that used (. *?) In its syntax. As far as I can see, it behaves the same as (. *)

Are there any advantages to using (. *?)? I can not understand why someone used this.

+4
source share
2 answers

.* greedy .*? - no. However, this only makes sense in context. Given the pattern:

<br/>(.*?)<br/> and <br/>(.*)<br/> , and input <br/>test<br/>test2<br/> ,

.* will correspond to <br/>test<br/>test2<br/> ,

.*? will only match <br/>test<br/> .

Note. Never use regex to parse complex html.

+7
source

in most regex options, *? production is not a greedy repeat. This means the result .*? first matches the empty string, and then, if that fails, one character, etc., until the match ends. On the contrary, greedy production .* First tries to match the entire input, and then, if that fails, it tries to reduce one character.

This concept applies only to regex engines that use recursive backtracking to match ambiguous expressions. Theoretically, they correspond to exactly the same responses, but since they try different things first, it is likely that they will be much faster than others.

It can also be useful when capture groups (in recursive and NAV types the same) are used to extract information from the corresponding action. For example, an expression like

 "(.*?)" 

can be used to capture the quoted string. Since the subgroup is not greedy, you can be sure that the quotation marks will not be captured, and the subgroup contains only the desired content.

+8
source

Source: https://habr.com/ru/post/1338663/


All Articles