Unicode login in console application in Java

I am trying to restore "user unicode input" in my Java application for a small piece of the utility. The problem is that it seems to be working on Ubuntu out of the box, which I assume is widespread in UTF-8, but does not work on Windows when started with cmd. The code in question is as follows:

public class SerTest { public static void main(String[] args) throws Exception { testUnicode(); } public static void testUnicode() throws Exception { System.out.println("Default charset: " + Charset.defaultCharset().name()); BufferedReader in = new BufferedReader(new InputStreamReader(System.in, "UTF-8")); System.out.printf("Enter ' ': "); String line = in.readLine(); String s = " "; byte[] sBytes = s.getBytes(); System.out.println("strg bytes: " + Arrays.toString(sBytes)); byte[] lineBytes = line.getBytes(); System.out.println("line bytes: " + Arrays.toString(lineBytes)); PrintStream out = new PrintStream(System.out, true, "UTF-8"); out.print("--->" + s + "<----\n"); out.print("--->" + line + "<----\n"); } } 

Ubuntu exit (without any configuration changes):

 me@host > javac SerTest.java && java SerTest Default charset: UTF-8 Enter ' ':   strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113] line bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113] ---> <---- ---> <---- 

The output to the Windows CMD prompt (in no way affected by JAVA_TOOL_OPTIONS):

 E:\>chcp 65001 Active code page: 65001 E:\>java -Dfile.encoding=utf8 SerTest Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf8 Default charset: UTF-8 Enter ' ': ': ':   strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113] Exception in thread "main" java.lang.NullPointerException at SerTest.testUnicode(SerTest.java:26) # byte[] lineBytes = line.getBytes(); at SerTest.main(SerTest.java:15) 

The output in the Eclipse console (after using JAVA_TOOL_OPTIONS):

 Default charset: UTF-8 Enter ' ':   strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf8 line bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113] ---> <---- ---> <---- 

On the Eclipse console, it works because I added a system environment variable (JAVA_TOOL_OPTIONS), which, if possible, I would like to avoid.

The output in the Eclipse console (after removing JAVA_TOOL_OPTIONS):

 Default charset: UTF-8 Enter ' ':   strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113] line bytes: [-61, -112, -62, -80, -61, -112, -62, -79, -61, -112, -62, -78, -61, -112, -62, -77, -61, -112, -62, -76, 32, -61, -111, -17, -65, -67, -61, -111, -59, -67, -61, -111, -17, -65, -67] ---> <---- --->абвгд Γ‘ Γ‘Ε½Γ‘ <---- 

So my question is: what exactly is going on here? What code changes will be required to ensure that this snippet works for all kinds of Unicode input?

Sorry for the long question and thanks in advance,
Sasuke

+4
source share
2 answers

Some notes:

  • -Dfile.encoding=utf8 not supported and may cause unintended side effects:

The file.encoding property is not required by the J2SE platform specification; this is an internal part of Sun's implementation and should not be verified or modified by user code. It is also read-only; it is technically impossible to support setting this property to arbitrary values ​​on the command line or at any other time during program execution.

  • The Console class will detect and use terminal encoding, but does not support 65001 (UTF-8) on Windows - at least this is not the last time I tried it.

I believe that the correct, documented way to use Unicode with cmd.exe is to use WriteConsoleW and ReadConsoleW .

I wrote a couple of blog posts when I looked at this:

+3
source

NPE is thrown when you try to call Arrays.toString(lineBytes) , which means that lineBytes is null.

lineBytes matters: line.getBytes() . getBytes() can return null only if an UnsupportedEncodingException thrown internally.

This happens in windows because the Windows command line does not support unicode by default. This works on Ubuntu because its command line is fully enabled in Unicode. It partially works with eclipse, because the Eclipse console window is a java component that supports unicode for input and does it for output using JAVA_TOOL_OPTIONS.

The bottom line is that you want to configure the Windows command line to be able to use Unicode characters. I have seen some discussions on this topic. Please take a look at this: Unicode characters on the Windows command line - how?

Hope this helps you.

+3
source

Source: https://habr.com/ru/post/1388464/


All Articles