The main difference is that the preg_
functions use the pcre library when the mb_ereg_
functions (including mb_split
) use the oniguruma library (used in ruby ββbefore version 2.0).
The main reason is that oniguruma can work with several encodings (ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE, EUC-JP, EUC-TW, EUC-KR, EUC - CN, Shift_JIS, Big5, GB18030, KOI8-R, CP1251, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO - 8859-16) when pcre cannot.
Please note that this list does not have a large number of encodings available for mb_
, such as mb_detect_encoding
(UTF-7, ArmSCII-8, CP866), restricting the relevance of mb_ereg_
functions. (Since you need to convert the string to a supported encoding before embarking on it, and convert it back.)
Two mechanisms of regular expressions have more or less the same functions, however, you can find some differences (not exhaustive how this happens):
Oniguruma does not support:
- Literal character classes with single-byte unicode characters that must be written without curly braces. Example:
\pN
displayed as pN
, you need to write: \p{N}
- Unicode character classes: Xan, Xps, Xsp, Xwd
- unshielded square brackets in a character class: Oniguruma see
[][]
as two empty character classes when pcre sees a character class containing ]
and [
\K
function- alias
\R
for newline - which use Python syntax
(?P<name>...)
. Only (?<name>...)
or (?'name'...)
allowed. - Links to groups with something other than Oniguruma syntax:
\g<name>
(Perl syntax (?&name)
and (?1)
or (?R)
not allowed). - backtrace control verbs
PCRE does not support:
- Duplicate named groups (default). To enable this feature, you need to use the modifier
(?J)
. - numbered backlinks with the syntax
\k<...>
. You can write \k<name>
, but not \k<1>
or \k<-1>
. - backlinks to a specific level of the nest. Oniguruma can do this using
\k<name+n>
, where n
is the level of the nest.
To match newlines with a period, Oniguruma uses the m
modifier when PCRE uses the s
modifier. In the mb_ereg_
functions mb_ereg_
dot corresponds to new characters by default. (Thus, the modifier m
enabled by default).
PCRE uses the s
modifier to match a newline with a period. The m
modifier behaves differently in PCRE; it changes the values ββof the ^
and $
bindings from the "start" and "end" lines to the "start" and "end" lines.
With Oniguruma, the meaning of these anchors does not change; they always coincide with the beginning and end of the line. To match the line limit, it uses \A
and \z
, also available with PCRE.
Note that Oniguruma was forked to give Onigmo (used in current versions of Ruby), which implements more Perl features and syntax elements, and this is more like PCRE.