http://www.python.org/dev/peps/pep-0100/
PEP 100 declares that the internal format, Unicode Python, contains UTF-16 encodings, but addresses the values ββas UCS-2 (or UCS-4 when compiled with the flag --enable-unicode=ucs4 ).
Why is UTF-16 (variable length format) not selected in contrast to UCS-2 (fixed length)?
Although the two encodings are basically the same, UTF-16 was already 4 years old when the PEP-100 (2000 Mar) was published. Was Python Unicode designed to solve backward compatibility issues?
I'm really curious why the Python internal format was implemented using this (apparently) hybrid approach for storing encoded data inside?
Itβs better to ask my question: does anyone have a link or a link with a quote from an official document that specifically states why the PEP 100 decided to consider UTF-16 as UCS-2 instead of using UTF-16?
source share