A short answer to the question “Q1: how does it represent the Unicode code code above UFFFF?”: You need UTF16 to know and correctly handle surrogate code points . The information and links below should contain pointers and sample code that allow you to do this.
The NSString documentation is correct. However, although you said that "NSString says it uses UTF-16 encoding for internal use," it’s more accurate to say that the open / abstract interface for NSString based on UTF16 . The difference is that this leaves the internal string representation of the private implementation detail, but public methods like characterAtIndex: and length are always in UTF16 .
The reason for this is that it strives to best balance the high-order ASCII -centric and Unicode strings, mainly because Unicode is a strict superset of ASCII ( ASCII uses 7 bits, for 128 characters that map to the first 128 Unicode code points )
Introduce Unicode Code Points that are> U+FFFF , which clearly exceeds what can be represented in one UTF16 Code Code , UTF16 uses special “Surrogate Code Points” to form a “Surrogate Pair” , which when combined will form a single Unicode code point > U+FFFF . You can find information about this at:
johne source share