I am using VisualStudio2010 to perform regular expression using PCRE. Let's say I have a sample and an item set in the std::wstring
following way:
std::wstring subject = L"サービス内容";
std::wstring pattern = L"ス内";
As you can see, I'm trying to find Japanese strings, and so I need to take the unicode version of PCRE, for example pcre16 or pcre32 with functions pcre16_exec
or pcre32_exec
etc.
Unfortunately, it does not work. My problem is converting from wstring
to unsigned short or unsigned int (depends on pcre16 or pcre32). I tried many functions (wcstombs_s, conversion strings with QString, etc.), but to no avail. The result of the exec function never contains the correct values that I expect. Im not quite sure what went wrong - matching patterns with ansi strings using simple pcre functions works fine. Here is a snippet:
pcre16 *re;
const char *error;
int erroffset;
int ovector[30];
int subject_length;
int rc;
std::wstring subjectstr = L"サービス内容";
std::wstring patternstr = L"ス内";
subject_length = 6;
const unsigned short pattern = ....
const insigned short subject = ....
re = pcre16_compile(&pattern, PCRE_UTF16, &error, &erroffset, NULL);
rc = pcre16_exec(re, NULL, &subject, subject_length, 0, 0, ovector, 30);
Can someone please give me a working example on how to detect Unicode patterns using PCRE or explain what went wrong? It annoys me myself.
source
share