No Unicode streams in C ++ 0x? What for?

Today I found that the standard C ++ committee rejected Unicode thread support in C ++ 0x in the second version. See this question for more information.

According to this document:

The rationale for the lack of specialization of the stream of two new types was that type streams not char did not attract widespread use, so it is unclear whether there is a real need to double the number of specializations of this very complex technique.

From this interview with Straustup:

Obviously, in the standard library, we must have Unicode streams and other extended Unicode support. The committee knew this, but did not have anyone with the skills and time to do this work, so unfortunately this is one of many areas where you need to seek โ€œthird partyโ€ support.

I don't understand Unicode, and I wonder why embedding Unicode streams is so complicated? What is problematic with this?

+6
source share
2 answers

The first paragraph you pointed out tells you: it's not that Unicode streams in particular are more complicated than other streams, it is that iostreams in general are extremely complex. Thus, the implementation of Unicode iostreams is difficult not because they are Unicode, but because they are iostreams .

+5
source

Document N2238 has not been relevant since 2007. I'm not sure what Straustrup specifically tells in the interview, but this is not news.

N3242 ยง22.5 still requires codecvt_utf8 and codecvt_utf16 , which you only need for input / output of Unicode files. imbue right side on wcout and should be good to go ... if you have a compatible library. However, in practice, GCC and MSVC already ship UTF-8, and I would expect every serious C ++ platform to maintain parity between mbstowcs and codecvt .

There may be confusion because N3242 ยง22.5 / 5 says

- Multibyte sequences can only be written as a binary file. Attempting to write to a text file creates undefined behavior.

This is due to the fact that text mode input / output converts the end of lines, so byte 0x10 , since half of the 16-bit word UTF-16 can be converted to 0x13, 0x10 , corrupting the stream. This has nothing to do with poor support ... just remember to open the file in binary mode, as in any library that provides this functionality.

+3
source

Source: https://habr.com/ru/post/885924/


All Articles