How to convert UTF-16 to UTF-32 and print the result of wchar_t in C?

I am trying to print a UTF-16 character string. I posted this question a while ago, and the tip was converted to UTF-32 using iconv and printed it as a wchar_t string.

I did some research and managed to code the following:

// *c is the pointer to the characters (UTF-16) i'm trying to print // sz is the size in bytes of the input i'm trying to print iconv_t icv; char in_buf[sz]; char* in; size_t in_sz; char out_buf[sz * 2]; char* out; size_t out_sz; icv = iconv_open("UTF-32", "UTF-16"); memcpy(in_buf, c, sz); in = in_buf; in_sz = sz; out = out_buf; out_sz = sz * 2; size_t ret = iconv(icv, &in, &in_sz, &out, &out_sz); printf("ret = %d\n", ret); printf("*** %ls ***\n", ((wchar_t*) out_buf)); 

Calling iconv always returns 0, so I think the conversion should be OK?

However, the seal seems successful and missed. From time to time, the converted string wchar_t prints OK. In other cases, the wchar_t printing problem seems to complete the call to the printf function completely, so even the final "***" is not printed.

I also tried using

 wprintf(((wchar_t*) "*** %ls ***\n"), out_buf)); 

but nothing is printed.

Am I missing something?

Reference: How to print UTF-16 characters in C?

UPDATE

included some suggestions in the comments.

updated code:

 // *c is the pointer to the characters (UTF-16) i'm trying to print // sz is the size in bytes of the input i'm trying to print iconv_t icv; char in_buf[sz]; char* in; size_t in_sz; wchar_t out_buf[sz / 2]; char* out; size_t out_sz; icv = iconv_open("UTF-32", "UTF-16"); memcpy(in_buf, c, sz); in = in_buf; in_sz = sz; out = (char*) out_buf; out_sz = sz * 2; size_t ret = iconv(icv, &in, &in_sz, &out, &out_sz); printf("ret = %d\n", ret); printf("*** %ls ***\n", out_buf); wprintf(L"*** %ls ***\n", out_buf); 

all the same result, not all UTF-16 lines are printed (both printf and wprintf).

What else can I skip?

btw, I use Linux and checked that wchar_t has 4 bytes.

+4
source share
1 answer

Here is a short program that converts UTF-16 to a wide array of characters and then prints it.

 #include <endian.h> #include <errno.h> #include <iconv.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <wchar.h> #define FROMCODE "UTF-16" #if (BYTE_ORDER == LITTLE_ENDIAN) #define TOCODE "UTF-32LE" #elif (BYTE_ORDER == BIG_ENDIAN) #define TOCODE "UTF-32BE" #else #error Unsupported byte order #endif int main(void) { void *tmp; char *outbuf; const char *inbuf; long converted = 0; wchar_t *out = NULL; int status = EXIT_SUCCESS, n; size_t inbytesleft, outbytesleft, size; const char in[] = { 0xff, 0xfe, 'H', 0x0, 'e', 0x0, 'l', 0x0, 'l', 0x0, 'o', 0x0, ',', 0x0, ' ', 0x0, 'W', 0x0, 'o', 0x0, 'r', 0x0, 'l', 0x0, 'd', 0x0, '!', 0x0 }; iconv_t cd = iconv_open(TOCODE, FROMCODE); if ((iconv_t)-1 == cd) { if (EINVAL == errno) { fprintf(stderr, "iconv: cannot convert from %s to %s\n", FROMCODE, TOCODE); } else { fprintf(stderr, "iconv: %s\n", strerror(errno)); } goto error; } size = sizeof(in) * sizeof(wchar_t); inbuf = in; inbytesleft = sizeof(in); while (1) { tmp = realloc(out, size + sizeof(wchar_t)); if (!tmp) { fprintf(stderr, "realloc: %s\n", strerror(errno)); goto error; } out = tmp; outbuf = (char *)out + converted; outbytesleft = size - converted; n = iconv(cd, (char **)&inbuf, &inbytesleft, &outbuf, &outbytesleft); if (-1 == n) { if (EINVAL == errno) { /* junk at the end of the buffer, ignore it */ break; } else if (E2BIG != errno) { /* unrecoverable error */ fprintf(stderr, "iconv: %s\n", strerror(errno)); goto error; } /* increase the size of the output buffer */ converted = size - outbytesleft; size <<= 1; } else { /* done */ break; } } converted = (size - outbytesleft) / sizeof(wchar_t); out[converted] = L'\0'; fprintf(stdout, "%ls\n", out); /* flush the iconv buffer */ iconv(cd, NULL, NULL, &outbuf, &outbytesleft); exit: if (out) { free(out); } if (cd) { iconv_close(cd); } exit(status); error: status = EXIT_FAILURE; goto exit; } 

Since UTF-16 is a variable-length encoding, you guess how large your output buffer should be. The correct program should handle the case when the output buffer is not large enough to store the converted data.

You should also notice that iconv not NULL terminates your output buffer for you.

Iconv is a thread-oriented processor, so you need to reset iconv_t if you want to reuse it for another conversion (the code example does this near the end). If you want to process the stream, you must handle the EINVAL error by copying all the bytes remaining in the input buffer to the beginning of the new input buffer before calling iconv again.

+4
source

Source: https://habr.com/ru/post/1383702/


All Articles