I am trying to restore "user unicode input" in my Java application for a small piece of the utility. The problem is that it seems to be working on Ubuntu out of the box, which I assume is widespread in UTF-8, but does not work on Windows when started with cmd. The code in question is as follows:
public class SerTest { public static void main(String[] args) throws Exception { testUnicode(); } public static void testUnicode() throws Exception { System.out.println("Default charset: " + Charset.defaultCharset().name()); BufferedReader in = new BufferedReader(new InputStreamReader(System.in, "UTF-8")); System.out.printf("Enter ' ': "); String line = in.readLine(); String s = " "; byte[] sBytes = s.getBytes(); System.out.println("strg bytes: " + Arrays.toString(sBytes)); byte[] lineBytes = line.getBytes(); System.out.println("line bytes: " + Arrays.toString(lineBytes)); PrintStream out = new PrintStream(System.out, true, "UTF-8"); out.print("--->" + s + "<----\n"); out.print("--->" + line + "<----\n"); } }
Ubuntu exit (without any configuration changes):
me@host > javac SerTest.java && java SerTest Default charset: UTF-8 Enter ' ': strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113] line bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113] ---> <---- ---> <----
The output to the Windows CMD prompt (in no way affected by JAVA_TOOL_OPTIONS):
E:\>chcp 65001 Active code page: 65001 E:\>java -Dfile.encoding=utf8 SerTest Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf8 Default charset: UTF-8 Enter ' ': ': ': strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113] Exception in thread "main" java.lang.NullPointerException at SerTest.testUnicode(SerTest.java:26) # byte[] lineBytes = line.getBytes(); at SerTest.main(SerTest.java:15)
The output in the Eclipse console (after using JAVA_TOOL_OPTIONS):
Default charset: UTF-8 Enter ' ': strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf8 line bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113] ---> <---- ---> <----
On the Eclipse console, it works because I added a system environment variable (JAVA_TOOL_OPTIONS), which, if possible, I would like to avoid.
The output in the Eclipse console (after removing JAVA_TOOL_OPTIONS):
Default charset: UTF-8 Enter ' ': strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113] line bytes: [-61, -112, -62, -80, -61, -112, -62, -79, -61, -112, -62, -78, -61, -112, -62, -77, -61, -112, -62, -76, 32, -61, -111, -17, -65, -67, -61, -111, -59, -67, -61, -111, -17, -65, -67] ---> <---- --->ΓΒ°ΓΒ±ΓΒ²ΓΒ³ΓΒ΄ Γ ΓΕ½Γ <----
So my question is: what exactly is going on here? What code changes will be required to ensure that this snippet works for all kinds of Unicode input?
Sorry for the long question and thanks in advance,
Sasuke