I'm having trouble reading in extra unicode characters using Java. I have a file that potentially contains characters in an extra set (something more than \ uFFFF). When I set up my InputStreamReader to read a file using UTF-8, I would expect the read () method to return one character for each additional character, instead it seems to break into a 16-bit threshold.
I saw some other questions about basic Unicode character characters, but nothing seems to be dealing with the more than 16-bit case.
Here are some simplified code examples:
InputStreamReader input = new InputStreamReader(file, "UTF8"); int nextChar = input.read(); while(nextChar != -1) { ... nextChar = input.read(); }
Does anyone know what I need to do to correctly read in a UTF-8 encoded file that contains extra characters?
source share