G-Clef (U + 1D11E) is not part of the Basic Multilingual Plane (BMP), which means that it requires more than 16 bits. Almost all Java read functions return only char or int , which also contain only 16 bits . Which function reads full Unicode characters, including SMP, SIP, TIP, SSP, and PUA?
Update
I asked how to read a single Unicode character (or code point) from an input stream. I have no integer array and don't want to read a string.
You can build code with Character.toCodePoint() , but this function requires char . Reading char , on the other hand, is not possible because read() returns an int . My best work so far is this, but it still contains unsafe roles:
public int read_code_point (Reader input) throws java.io.IOException { int ch16 = input.read(); if (Character.isHighSurrogate((char)ch16)) return Character.toCodePoint((char)ch16, (char)input.read()); else return (int)ch16; }
How to do it better?
Update 2
Another version that returns String but still uses casts:
public String readchar (Reader input) throws java.io.IOException { int i16 = input.read(); // UTF-16 as int if (i16 == -1) return null; char c16 = (char)i16; // UTF-16 if (Character.isHighSurrogate(c16)) { int low_i16 = input.read(); // low surrogate UTF-16 as int if (low_i16 == -1) throw new java.io.IOException ("Can not read low surrogate"); char low_c16 = (char)low_i16; int codepoint = Character.toCodePoint(c16, low_c16); return new String (Character.toChars(codepoint)); } else return Character.toString(c16); }
The remaining question is: are receptions safe or how to avoid them?
source share