Why is line C encoded differently on a computer running Windows and Linux?
Firstly, this is not a Windows / Linux (operating system) problem, but the compiler exists as a compiler on Windows, which is encoded as gcc (regular on Linux).
This is allowed by C , and two compiler vendors have mapped out different implementations according to their own programming goals, MS using CP-1252 and Linux using Unicode . @Danh . Preliminary MS Unicode selection dates. Not surprisingly, different compiler manufacturers use different solutions.
5.2.1 Character Sets
1 Two character sets and the sorting sequences associated with them must be defined: the set in which the source files are recorded (the source set of characters ), and the set interpreted in the runtime (the set of run characters). Each set is further subdivided into the basic set of characters, contents which is specified by this subclause, and a set of zero or more language elements (which are not members of the basic character set), called extended characters . The combination set is also called the extended character set. The values โโof the execution character set elements are defined upon implementation . C11dr ยง5.2.1 1 (My emphasis)
strlen("รถ") = 1 strlen("รถ") = 2
"รถ"
encoded for extended compiler source character characters.
I suspect that MS is focused on maintaining its code base and encourages other languages. Linux is just an earlier C Unicode adapter, although MS was an early influencer Unicode.
As Unicode support grows , I expect this to be the solution of the future.
source share