How to handle PCRE using Unicode?

I am using VisualStudio2010 to perform regular expression using PCRE. Let's say I have a sample and an item set in the std::wstringfollowing way:

std::wstring subject = L"サービス内容";
std::wstring pattern = L"ス内";

As you can see, I'm trying to find Japanese strings, and so I need to take the unicode version of PCRE, for example pcre16 or pcre32 with functions pcre16_execor pcre32_execetc.

Unfortunately, it does not work. My problem is converting from wstringto unsigned short or unsigned int (depends on pcre16 or pcre32). I tried many functions (wcstombs_s, conversion strings with QString, etc.), but to no avail. The result of the exec function never contains the correct values ​​that I expect. Im not quite sure what went wrong - matching patterns with ansi strings using simple pcre functions works fine. Here is a snippet:

pcre16 *re;
const char *error;
int erroffset;
int ovector[30]; //The reult of the matching
int subject_length;
int rc;

std::wstring subjectstr = L"サービス内容";
std::wstring patternstr = L"ス内";
subject_length = 6;

const unsigned short pattern = ....// string conversion from patternstr
const insigned short subject = ....// string conversion from subjectstr

re = pcre16_compile(&pattern, PCRE_UTF16, &error, &erroffset, NULL);
rc = pcre16_exec(re, NULL, &subject, subject_length, 0, 0, ovector, 30);

Can someone please give me a working example on how to detect Unicode patterns using PCRE or explain what went wrong? It annoys me myself.

+4
source share
2 answers

std::wstrings C c_str:

const unsigned short *pattern = subjectstr.c_str();
const unsigned short *subject = patternstr.c_str();

, PCRE UTF-8, UTF-16 UTF-32. PCRE_UTF16 16- .

0

.

wchar const unsigned short (PCRE_SPTR16). .... , -. subStrVec:

pcre16 *reCompiled;
int pcreExecRet;
int subStrVec[30];
const char *pcreErrorStr;
int pcreErrorOffset;  

std::wstring pattern = L"容内容";
std::wstring subject = L"容容容内容容容";

const wchar_t* aStrRegex = pattern.c_str();
const wchar_t* line = subject.c_str();

reCompiled = pcre16_compile((PCRE_SPTR16)aStrRegex, PCRE_UTF8, &pcreErrorStr, &pcreErrorOffset, NULL);
pcreExecRet = pcre16_exec(reCompiled, NULL, (PCRE_SPTR16)line, wcslen(line), 0, 0, subStrVec, 30);
0

Source: https://habr.com/ru/post/1502819/


All Articles