Is it possible to convert UTF32 text to UTF16 using only the Windows API?

I am trying to find that converting UTF-32 text to / from any code page is possible using only the Windows API. I cannot use the CLR to complete this task.

The code page identifiers page at Microsoft at http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx lists UTF-32 as available only for a managed application.

ConvertStringTo / FromUnicode does not work when using UTF-32.

+4
source share
3 answers

With a little knowledge of Unicode, you can create a UTF32 to UTF16 converter without using any APIs.

All characters in the range U + 0000 to U + FFFF can simply delete the upper 16 bits.

Values โ€‹โ€‹in the range U + 10000 to U + 10FFFF can be converted into two 16-bit words called surrogate pairs:

http://en.wikipedia.org/wiki/UTF-16#Encoding_of_characters_outside_the_BMP

+1
source

You can use the iconv library on Windows. It fully supports UTF-32 (large and small endian).

0
source

You can use this function, which converts the UTF-32 code point to the equivalent UTF-16 code (both single and surrogate, depending on the case) as the first argument, and high and low surrogates both the second and third arguments , High and low surrogate values โ€‹โ€‹are returned by reference.

If the code point is below 0x10000, then we simply return this code in the lower surrogate by reference, and the high surrogate is 0.

If the code point is greater than 0x10000, then we calculate pairs with high and low surrogates using the rules specified on this page on Wikipedia:

https://en.wikipedia.org/wiki/UTF-16#Example_UTF-16_encoding_procedure

Here is the code:

unsigned int convertUTF32ToUTF16(unsigned int cUTF32, unsigned int &h, unsigned int &l) { if (cUTF32 < 0x10000) { h = 0; l = cUTF32; return cUTF32; } unsigned int t = cUTF32 - 0x10000; h = (((t<<12)>>22) + 0xD800); l = (((t<<22)>>22) + 0xDC00); unsigned int ret = ((h<<16) | ( l & 0x0000FFFF)); return ret; } 
0
source

Source: https://habr.com/ru/post/1285799/


All Articles