What exactly does U + mean and why can't I create a Unicode intermediate string table in my C ++ application?

I am trying to convert an application from Java + Swing to C ++ + Qt. At some point, I had to deal with some Unicode intermediates. In Java, this was pretty simple:

private static String[] hiraganaTable = {
    "\u3042", "\u3044", "\u3046", "\u3048", "\u304a", 
    "\u304b", "\u304d", "\u304f", "\u3051", "\u3053", 
    ...
}

... whereas in C ++ I have problems:

QString hiraganaTable[] = {
    "\x30\x42", "\x30\x44", "\x30\x46", "\x30\x48", "\x30\x4a", 
    "\x30\x4b", "\x30\x4d", "\x30\x4f", "\x30\x51", "\x30\x53", 
    ...
};

I could not use \ u in VS2008 because I got a bunch of form warnings:

represented by the universal symbol name '\ u3042' cannot be represented on the current code page (1250)

And do not call me stupid, I tried to use File-> Advanced Save Options, but nothing has changed, the code page has not changed at all. This seems to be a known issue: How to create a UTF-8 string literal in Visual C ++ 2008

, , , Vim \x30\x42. , QStrings . . fromAscii(), fromUtf8(), fromLocal8Bit(), QString (QByteArray), . , U + 3042 , , , "E3 81 82" . QString:: fromAscii(). , "U +" "U + 3042" ( 0xE38182 - 0x3042 = E35140, , Unicode?). , UTF-8?

+3
3

, ++ C, ASCII. C "abc" - 8 . Visual ++ 16- Unicode (UTF-16) , : L"abc\u3042". wchar_t[N] char[N], std::wstring.

Qt wchar_t, QStrings .

+3

, , UTF-8 .

>>> u'\u3042'.encode('utf-8').encode('hex')
'e38182'

UTF-8, .

"U +" , Unicode, .

EDIT:

, Python ( , ):

>>> print ',\n'.join(', '.join('"%s"' % (y.encode('utf-8').encode('string-escape')
      ,) for y in x) for x in [u'あいうえお', u'かきくけこ', u'さしすせそ'])
"\xe3\x81\x82", "\xe3\x81\x84", "\xe3\x81\x86", "\xe3\x81\x88", "\xe3\x81\x8a",
"\xe3\x81\x8b", "\xe3\x81\x8d", "\xe3\x81\x8f", "\xe3\x81\x91", "\xe3\x81\x93",
"\xe3\x81\x95", "\xe3\x81\x97", "\xe3\x81\x99", "\xe3\x81\x9b", "\xe3\x81\x9d"
+4

"U + dddd", d , Unicode.

16- 8- ; .

, . ( ) L"\0x3042" L"\u3042".

, QString .

Note. Visual C ++ will emit a dumb warning for the notation \Uused in literals, while g ++ will emit stupid words for this notation used outside of literals.

Cheers and hth.,

+2
source

Source: https://habr.com/ru/post/1776582/


All Articles