Protocol Buffers and UTF-8

The history of coding schemes / several operating systems and Endian-nes has led to confusion in terms of coding all forms of string data (--ie, all alphabets); for this reason, protocol buffers only process ASCII or UTF-8 in their string types, and I don't see any polymorphic overloads that C ++ wstring accept. The question then becomes, how can one expect the UTF-16 string to be included in the protocol buffer?

Presumably, I need to save the data as wstring in my application code, and then do the UTF-8 conversion before I inject it into (or extract) from the message. What is the easiest way for Windows / Linux to do this (one function call from a well-supported library will make my day)?

The data will come from various web servers (Linux and Windows) and will end up in SQL Server (and possibly other endpoints).

- edit 1 -

Mark Wilkins' proposal seems to fit the bill, perhaps someone who has experience with the library can post a piece of code - from wstring to UTF-8 - so I can appreciate how easy it will be.

- edit 2 -

offer even more. Next, I will explore boost serialization.

+3
source share
4 answers

This may be redundant, but the ICU libraries will do whatever you need, and you can use them on both Windows and Linux.

, , Windows MultiByteToWideChar WideCharToMultiByte UTF-8

UTF-16. :

// utf-8 to utf-16
MultiByteToWideChar( CP_UTF8, 0, myUtf8String, -1,
                     myUtf16Buf, lengthOfUtf16Buf );

Linux libidn , . UTF-8 UCS, , , UTF-32 . :

// utf-8 to UCS
ucsStr = stringprep_utf8_to_ucs4( "asdf", 4, &items );

Linux , UTF-8. UTF-16, , Linux .

+1

UTF-8 codecvt facet, unicode UTF-8 . , .

+3

UTF8-CPP:

// converts a utf-8 encoded std::string s to utf-16 wstring ws
utf8to16(s.begin(), s.end(), back_inserter(ws));
+3

On Linux, this is trivial: each wchar_tis one Unicode code, and with trivial bits you can find the corresponding UTF-8 bytes. On Windows, this is not much more complicated since there is an API for it:WideCharToMultiByte(CP_UTF8, 0, input.c_str(), input.size(), &out[0], out.size(), 0,0);

+1
source

Source: https://habr.com/ru/post/1730049/


All Articles