Why was the Python Unicode internal format implemented as described in PEP 100?

Question

Why was the Python Unicode internal format implemented as described in PEP 100?

http://www.python.org/dev/peps/pep-0100/

PEP 100 declares that the internal format, Unicode Python, contains UTF-16 encodings, but addresses the values as UCS-2 (or UCS-4 when compiled with the flag --enable-unicode=ucs4 ).

Why is UTF-16 (variable length format) not selected in contrast to UCS-2 (fixed length)?

Although the two encodings are basically the same, UTF-16 was already 4 years old when the PEP-100 (2000 Mar) was published. Was Python Unicode designed to solve backward compatibility issues?

I'm really curious why the Python internal format was implemented using this (apparently) hybrid approach for storing encoded data inside?

It’s better to ask my question: does anyone have a link or a link with a quote from an official document that specifically states why the PEP 100 decided to consider UTF-16 as UCS-2 instead of using UTF-16?

+4

python encoding unicode utf-16 ucs2

mkelley33 Nov 05 '11 at 20:53

source share

1 answer

John machin · Answer 1 · 2011-11-05T21:17:01+0000

Read a little further: "UCS-2 and UTF-16 are the same for all currently defined Unicode character characters" ... and this was true in 2000 when PEP was written. The initial implementation covered only BMP (the first 64K code points).

Why was the Python Unicode internal format implemented as described in PEP 100?

More articles: