Is it better to use search or capture groups?

I'm not sure if one of them is โ€œbetterโ€ and the other and why it will be, but I have an original line that looks like this:

$string = '/random_length_user/file.php'; 

Now there are two ways to match: the first, using my new friend, appearance and the second, without:

 preg_match("%(?<=^/)([^/]*)%", $string, $capture); preg_match("%^/([^/]*)%", $string, $capture); 

They come back to:

 Array ( [0] => random_length_user ) Array ( [0] => /random_length_user [1] => random_length_user ) 

Essentially, I get the result that I want in $ capture [0] using look-behind, and in $ capture [1] without. Now the question is, is there ... is there a reason to prefer one of these methods over another?

+4
source share
2 answers

The problem is that the lookbehind approach is not so flexible; it crashes when you start communicating with variable length matches. For example, suppose you want to extract the file name in your example, and you did not know the directory name. The capture group technique is still working fine:

 preg_match("%^/\w+/([^/]*)%", '/random_length_user/file.php'); Array ( [0] => /random_length_user/file.php [1] => file.php ) 

... but the lookbehind approach does not, because lookbehind expressions can only match a fixed number of characters. However, there is an even better alternative: \K , the MATCH POINT RESET statement. Wherever you express it, the regex engine pretends that the match really started there. This way you get the same result as with lookbehind, with no fixed length limit:

 preg_match('%^/\w+/\K[^/]+$%', '/random_length_user/file.php'); Array ( [0] => file.php ) 

As far as I know, this function is available only in Perl 5.10+ and in tools (for example, PHP preg_ ), which are supported by the PCRE library. For PCRE help, see the man page and search (F3) for \K

+1
source

This probably doesn't matter with preg_match , but it does matter when using preg_replace , as it affects what will be replaced.

It can also be a problem when you perform a global match because the capture group will consume characters while the search queries will not

Trivial example:

  • /(?<=a)a/g with 'aaaa' gives Array('a', 'a', 'a')
  • /(a)a/g with 'aaaa' gives Array('aa', 'aa')
+3
source

Source: https://habr.com/ru/post/1299289/


All Articles