[[:>:]] or [[: <:]] do not match

I try to use [[:>:]] in my regular expression, but they are not accepted until other character classes, for example. [[:digit:]] or [[:word:]] . What's wrong?

Online demo

+5
source share
2 answers

This is a mistake because these constructs (the initial word boundary, [[:<:]] and the final word boundary [[:>:]] ) are supported by the PCRE library itself :

 COMPATIBILITY FEATURE FOR WORD BOUNDARIES In the POSIX.2 compliant library that was included in 4.4BSD Unix, the ugly syntax [[:<:]] and [[:>:]] is used for matching "start of word" and "end of word". PCRE treats these items as follows: [[:<:]] is converted to \b(?=\w) [[:>:]] is converted to \b(?<=\w) Only these exact character sequences are recognized. A sequence such as [a[:<:]b] provokes error for an unrecognized POSIX class name. This support is not compatible with Perl. It is provided to help migrations from other environments, and is best not used in any new patterns. Note that \b matches at the start and the end of a word (see "Simple asser- tions" above), and in a Perl-style pattern the preceding or following character normally shows which is wanted, without the need for the assertions that are used above in order to give exactly the POSIX be- haviour. 

When used in PHP code, it works:

 if (preg_match_all('/[[:<:]]home[[:>:]]/', 'homeless and home', $m)) { print_r($m[0]); } 

finds Array ( [0] => home) . See an online PHP demo .

So, the regex101.com development team decided (or forgot) to include support for these pair of word boundaries .

Instead of regex101.com, use \b word boundaries (both start and end), which are supported by all 4 regex101.com regular expressions: PCRE, JS, Python and Go.

These word boundaries are mainly supported by POSIX-like engines, for example, the demo version of PostgreSQL . The regular expression [[:<:]]HR[[:>:]] finds a match in Head of HR but does not find a match in <A HREF="some.html and CHROME .

Other regex engines that support word boundaries [[:<:]] and [[:>:]] are the base R ( gsub with no argument perl=TRUE , for example) and MySQL.

In Tcl regex, there is \m for [[:<:]] (the initial word boundary) and \m for the end of the word boundary ( [[:>:]] ).

+3
source

Instead, you can use \b(?<=d) or \b(?=d) . In any case, the PCRE engine converts [[:<:]] to \b(?=\w) and [[:>:]] to \b(?<=\w) before the start of the match.

+3
source

Source: https://habr.com/ru/post/1275223/


All Articles