Convert from ASCII to Unicode char code (FreeType2)

Question

Convert from ASCII to Unicode char code (FreeType2)

I am using FreeType2 in one of my projects. To make a letter, I need to provide a two-byte Unicode code. Char codes that read programs are in single-byte ASCII format. This is not a problem for char codes below 128 (character codes are the same), but the remaining 128 do not match. For instance:

'a' in ASCII is 0x61, 'a' in Unicode is 0x0061 - it's fine "±" in ASCII is 0xB9, "±" in Unicode is 0x0105

I tried to use WinAPI functions here, but I have to do something wrong. Here's a sample:

unsigned char szTest1[] = "ąółź"; //ASCII format wchar_t* wszTest2; int size = MultiByteToWideChar(CP_UTF8, 0, (char*)szTest1, 4, NULL, 0); printf("size = %d\n", size); wszTest2 = new wchar_t[size]; MultiByteToWideChar(CP_UTF8, 0, (char*)szTest1, 4, wszTest2, size); printf("HEX: %x\n", wszTest2[0]); delete[] wszTest2;

I expect a new wide line to be created, with no NULL end. However, the variable size is always 0. Any idea what I'm doing wrong? Or maybe there is an easier way to solve the problem?

+4

c ++ unicode ascii freetype

Tomalla Oct 16 '12 at 14:32

source share

2 answers

The "clean" ASCII character set is limited to 0-127 (7 bits). The 8-bit characters with the most significant bit set (i.e., in the range 128-255) are not uniquely defined: their definition depends on the code page . So your character ą (LATIN SMALL LETTER A WITH OGONEK) is represented by the value 0xB9 on the specific code page, which should be Windows-1250 . On other code pages, the value 0xB9 is associated with another character (for example, in the code page of Windows 1252 , 0xB9 to the symbol ¹ , that is, the superscript 1).

To convert characters from a specific code page to Unicode UTF-16 using the Windows Win32 API, you can use MultiByteToWideChar by specifying the correct code page (which is not CP_UTF8 , as written in your question code, actually CP_UTF8 identifies Unicode UTF-8) . You can try specifying 1250 (ANSI Central European, Central European (Windows)) as the correct codepage identifier .

If you have access to ATL in your code, you can use the convenience of ATL string conversion helper classes like CA2W , which wraps a call to MultiByteToWideChar( ) and memory allocation in the RAII class; eg:.

 #include <atlconv.h> // ATL String Conversion Helpers // 'test' is a Unicode UTF-16 string. // Conversion is done from code-page 1250 // (ANSI Central European; Central European (Windows)) CA2W test("ąółź", 1250);

Now you can use the test string in your Unicode APIs.

If you do not have access to ATL or require a C ++ STL solution, you may need to consider the following code:

 /////////////////////////////////////////////////////////////////////////////// // // Modern STL-based C++ wrapper to Win32 MultiByteToWideChar() C API. // // (based on http://code.msdn.microsoft.com/windowsdesktop/C-UTF-8-Conversion-Helpers-22c0a664) // /////////////////////////////////////////////////////////////////////////////// #include <exception> // for std::exception #include <iostream> // for std::cout #include <ostream> // for std::endl #include <stdexcept> // for std::runtime_error #include <string> // for std::string and std::wstring #include <Windows.h> // Win32 Platform SDK //----------------------------------------------------------------------------- // Define an exception class for string conversion error. //----------------------------------------------------------------------------- class StringConversionException : public std::runtime_error { public: // Creates exception with error message and error code. StringConversionException(const char* message, DWORD error) : std::runtime_error(message) , m_error(error) {} // Creates exception with error message and error code. StringConversionException(const std::string& message, DWORD error) : std::runtime_error(message) , m_error(error) {} // Windows error code. DWORD Error() const { return m_error; } private: DWORD m_error; }; //----------------------------------------------------------------------------- // Converts an ANSI/MBCS string to Unicode UTF-16. // Wraps MultiByteToWideChar() using modern C++ and STL. // Throws a StringConversionException on error. //----------------------------------------------------------------------------- std::wstring ConvertToUTF16(const std::string & source, const UINT codePage) { // Fail if an invalid input character is encountered static const DWORD conversionFlags = MB_ERR_INVALID_CHARS; // Require size for destination string const int utf16Length = ::MultiByteToWideChar( codePage, // code page for the conversion conversionFlags, // flags source.c_str(), // source string source.length(), // length (in chars) of source string NULL, // unused - no conversion done in this step 0 // request size of destination buffer, in wchar_t's ); if (utf16Length == 0) { const DWORD error = ::GetLastError(); throw StringConversionException( "MultiByteToWideChar() failed: Can't get length of destination UTF-16 string.", error); } // Allocate room for destination string std::wstring utf16Text; utf16Text.resize(utf16Length); // Convert to Unicode UTF-16 if ( ! ::MultiByteToWideChar( codePage, // code page for conversion 0, // validation was done in previous call source.c_str(), // source string source.length(), // length (in chars) of source string &utf16Text[0], // destination buffer utf16Text.length() // size of destination buffer, in wchar_t's )) { const DWORD error = ::GetLastError(); throw StringConversionException( "MultiByteToWideChar() failed: Can't convert to UTF-16 string.", error); } return utf16Text; } //----------------------------------------------------------------------------- // Test. //----------------------------------------------------------------------------- int main() { // Error codes static const int exitOk = 0; static const int exitError = 1; try { // Test input string: // // ą - LATIN SMALL LETTER A WITH OGONEK std::string inText("x - LATIN SMALL LETTER A WITH OGONEK"); inText[0] = 0xB9; // ANSI Central European; Central European (Windows) code page static const UINT codePage = 1250; // Convert to Unicode UTF-16 const std::wstring utf16Text = ConvertToUTF16(inText, codePage); // Verify conversion. // ą - LATIN SMALL LETTER A WITH OGONEK // --> Unicode UTF-16 0x0105 // http://www.fileformat.info/info/unicode/char/105/index.htm if (utf16Text[0] != 0x0105) { throw std::runtime_error("Wrong conversion."); } std::cout << "All right." << std::endl; } catch (const StringConversionException& e) { std::cerr << "*** ERROR:\n"; std::cerr << e.what() << "\n"; std::cerr << "Error code = " << e.Error(); std::cerr << std::endl; return exitError; } catch (const std::exception& e) { std::cerr << "*** ERROR:\n"; std::cerr << e.what(); std::cerr << std::endl; return exitError; } return exitOk; } ///////////////////////////////////////////////////////////////////////////////

+5

Mr.C64 Oct 16 '12 at 15:02

source share

shf301 · Accepted Answer · 2012-10-16T14:40:25+0000

The CodePage parameter on the MultiByteToWideChar CodePage . Utf-8 is not the same as ASCII. You should use CP_ACP , which indicates the current system code page (which does not match ASCII - see Unicode, UTF, ASCII, ANSI format differences )

The size is zero, most likely because your test string is not a valid Utf-8 string.

For almost all Win32 functions, you can call GetLastError () after the function cannot get a detailed error code, so a call that will give you more detailed information.

Convert from ASCII to Unicode char code (FreeType2)

More articles: