PCRE PHP A concrete example of the use and usefulness of the "S" modifier (Additional template analysis)?

The PHP manual states the following about the PCRE "S" modifier (Advanced template analysis) at http://php.net/manual/en/reference.pcre.pattern.modifiers.php

S

If the template will be used several times, it is worth spending more time analyzing it to speed up the time for matching. If this modifier is installed, then this additional analysis is complete. At present, the study of the sample is only useful for non-fixed patterns that do not have a single fixed start character.

Thus, its use is associated with patterns that must be used several times, without anchors (e.g. ^ , $ ) embedded in them or a fixed initial char sequence, for example. in a template of type '/^abc/' .

But there are no specific details about where, for example. apply this modifier and how it actually works.

Is it used only for the PHP stream of the current executable script, and after the script is executed, a "cached" analysis of the template is performed? Or does the engine store the analysis of the template in the global cache, which then becomes available for several PHP threads that use PCRE with the template marked with this modifier?

Also from PCRE introduction: http://php.net/manual/en/intro.pcre.php

Note. This extension supports global cache stream in compiled regular expressions stream (up to 4096)

If the "S" modifier is used only for threads, how does it differ from the PCRE cache of compiled regular expressions? I assume that additional information is stored, something like MySQL, when indexing rows in a table (of course, in the case of PCRE, this additional information is stored in memory).

And last but not least, someone experienced a real use case when he used this modifier, and you noticed an improvement and appreciated its benefits?

Thank you for attention.

+6
source share
1 answer

PHP documents contain a small portion of PCRE documents. Here are a few more details (highlighted by me) from PCRE 8.36 :

If the compiled template will be used several times, it is worth spending more time to analyze it in order to speed up the time required for comparison. The pcre_study() function takes a pointer to a compiled template as the first argument. If examining the template creates additional information that will help speed up the matching, pcre_study() returns a pointer to the pcre_extra block, in which the study_data field indicates the results of the study.

...

Studying the pattern does two things: first, it computes the lower bound on the length of the subject’s string, which is necessary to match the pattern . This does not mean that there are any lines of this length that match, but it ensures that no shorter lines match. The value is used to avoid wasting time trying to match strings that are shorter than the lower bound. You can find out the value in the calling program with the pcre_fullinfo() function.

Learning a pattern is also useful for unbound patterns that don't have one fixed start character. A bitmap of possible leading bytes is created. This speeds up the search for a position in the object from which to start matching. (In 16-bit mode, a bitmap is used for 16-bit values ​​less than 256. In 32-bit mode, a bitmap is used for 32-bit values ​​less than 256.)

Please note that in a later version of PCRE (v10.00, also called PCRE2), lib underwent a massive reorganization and processing of the API. One consequence is that learning is always performed at PCRE 10.00 and above. I do not know when PHP will use PCRE2, but it will happen sooner or later, because now PCRE 8.x will not receive any new functions.

Here is a quote from the PCRE2 release announcement :

The explicit "study" of compiled patterns has been canceled - now it always happens automatically. JIT compilation is done by calling a new function, pcre2_jit_compile() after successfully returning from pcre2_compile() .


Regarding your second question:

If the "S" modifier is used only for threads, how does it differ from the PCRE cache of compiled regular expressions?

There is no cache in PCRE itself, but PHP supports the regexp cache to avoid recompiling the same pattern over and over, for example, if you use the preg_ function inside a loop.

+2
source

Source: https://habr.com/ru/post/982700/


All Articles