Is there even a quick embedding of a multi-line character string in unicode wstring?

Question

Is there even a quick embedding of a multi-line character string in unicode wstring?

In my project, where I adopted the Aho-Corasick algorithm to execute some server-side message filter mode, the message received by the server is a multibyte character string. But after several tests, I found that the bottleneck is the conversion between the mulitbyte string and the unicode wstring. Now I use a couple of mbstowcs_s and wcstombs_s, which takes up almost 95% of the cost of the whole mode. In addition, I tried MultiByteToWideChar / WideCharToMultiByte, it got the same result. So I wonder if there is any other effective way to do this work? My project is built on VS2005, and the converted string will contain Chinese characters. Many thanks.

+3

c windows multibyte

Avalon Jan 27 '10 at 9:56

source share

4 answers

Michael J · Answer 1 · 2010-01-27T12:23:09+0000

There are several possibilities.

First, what do you mean by "multibyte character"? Do you mean UTF8 or the ISO DBCS system?

If you look at the definitions of UTF8 and UTF16, you can make a highly optimized conversion by tearing out the x bits and reformatting them. See For example, http://www.faqs.org/rfcs/rfc2044.html talks about UTF8 <==> UTF32. Setting up for UTF16 will be easy.

The second option may be to fully work in UTF16. Provide your web page (or user interface dialog box or something else) in UTF16 and get the user login this way.

, , Aho-Corasick. , , .

[ 29 2010] . http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt, C mbtowc() wctomb(). wchar_ts. 16- wchar_ts, .

, ( ) .

acron · Answer 2 · 2010-01-27T10:05:21+0000

( ), (mbstowcs wcstombs). , . , (, a-z, 0-9), .?

Alex budovski · Answer 3 · 2010-01-27T10:12:05+0000

Perhaps you can reduce the number of calls on MultiByteToWideChar?

Avi · Answer 4 · 2010-01-27T10:32:25+0000

Perhaps you can also take Aho-Corasick to work directly on multi-byte strings.

Is there even a quick embedding of a multi-line character string in unicode wstring?

More articles: