PCRE: return offset number of the corresponding subpattern

I want to group a large number of templates that must be mapped to various HTML elements, attributes, and text in web documents.

For example, I might be interested in the contents of an element <title>and have a regular expression:

pcre *test_filter = pcre_compile("(google|stackoverflow|expertsexchange)",0,&error,&erroffset,NULL);

If I were to test the input string "stackoverflow", I am wondering if it is possible to somehow refer to the offset within this group, that is, 1 in this case, 0 for google and 2 for experstexchange.

Ideally, I'm going to concatenate a bunch of text strings, and it looks like this would be the most obvious way to figure out which group member matches, rather than perform extra regular expressions.

Is there such functionality with pcre?

+4
source share
1 answer

The RE pattern you give is good for finding the value of a string that matches, but then you need to (at least) find the matching value to get the index in the group. If you change the template to have each word in its own capture group, you can use the return value from pcre_exec()to get the index (plus 1) of the last capture group.

If the pattern "(google)|(stackoverflow)|(expertsexchange)", then, if googlematched, pcre_exec()would return 1(or 2for stackoverflow& c).

; () , - , .

+2

Source: https://habr.com/ru/post/1656885/


All Articles