What does \ x mean in PHP PCRE?

Question

What does \ x mean in PHP PCRE?

From the manual :

After \x , up to two hexadecimal digits are read (letters can be in upper or lower case). In UTF-8 mode, \x{...} allowed, where the content of curly brackets is a string of hexadecimal digits. this is interpreted as a UTF-8 character whose code number is the given hexadecimal number. The original hexadecimal escape sequence, \xhh , matches the UTF-8 double-byte character if the value is greater than 127.

And what does it mean?

The code point "ä" is E4, and the UTF-8 representation is C3A4, but the neter of these matches is:

 $t = 'ä'; // same as "\xC3\xA4"; preg_match('/\\xC3A4/u', $t); // doesn't match preg_match('/\\x00E4/u', $t); // doesn't match

With curly braces, it matches when I specify the code:

 preg_match('/\\x{00E4}/u', $t); // matches

+4

php regex pcre

AndreKR Aug 29 '13 at 23:40

source share

1 answer

user2246674 · Accepted Answer · 2013-08-29T23:43:51+0000

Syntax is a way to specify a character by value:

\xAB indicates a code point in the range 0-FF.
\x{ABCD} indicates a code point in the range 0-FFFF.

This wording from the manual is a bit confusing, perhaps in an attempt to be precise. Character values 128-255 (and some) are encoded as 2 bytes in UTF-8. Thus, the unicode regular expression will match 7-bit pure ASCII, but will not match other encodings / code pages (i.e. CP437 ) that use values in the specified range. The manual says in a roundabout way that the regular expression unicode is only suitable for use with correctly encoded input. However,

This does not mean that \xABCD parsed as \x{ABCD} (one character). It is parsed as \xAB (one character) and then CD (two characters) ¹ . These braces address this parsing ambiguity problem:

After \ x, up to two hexadecimal digits are read. In UTF-8 mode, \ x {...} is allowed.

Some other languages use \u instead of \x for a longer form.

¹ Note that this corresponds to:

preg_match ('/ \ xC3A4 / u', "\ xC3". "A4");

What does \ x mean in PHP PCRE?

More articles: