I assume that your console is still running under cmd.exe. I doubt your console really expects UTF-8 - I expect it to be truly DOS OEM encoding (e.g. 850 or 437. )
Java will encode bytes using the default encoding set during JVM initialization.
Play on my pc:
java Foo
Java is encoded as windows-1252; the console is decoded as an IBM850. Result: Mojibake
java -Dfile.encoding=UTF-8 Foo
Java is encoded as UTF-8; the console is decoded as an IBM850. Result: Mojibake
cat test.txt
cat decodes the file as UTF-8; cat is encoded as IBM850; the console is decoded as an IBM850.
java Foo | cat
Java is encoded as windows-1252; cat is decoded as windows-1252; cat is encoded as IBM850; console decodes as IBM850
java -Dfile.encoding=UTF-8 Foo | cat
Java is encoded as UTF-8; cat decodes as UTF-8; cat is encoded as IBM850; console decodes as IBM850
This cat implementation should use a heuristic to determine if the character data is UTF-8 or not, then transcode the data from UTF-8 or ANSI (e.g. windows-1252) to console encoding (e.g. IBM850.)
This can be confirmed using the following commands:
$ java HexDump utf8.txt 78 78 c3 a4 c3 b1 78 78 $ cat utf8.txt xxäñxx $ java HexDump ansi.txt 78 78 e4 f1 78 78 $ cat ansi.txt xxäñxx
The cat command can perform this determination because e4 f1 not a valid UTF-8 sequence.
You can fix Java output:
HexDump is a trivial Java application:
import java.io.*; class HexDump { public static void main(String[] args) throws IOException { try (InputStream in = new FileInputStream(args[0])) { int r; while((r = in.read()) != -1) { System.out.format("%02x ", 0xFF & r); } System.out.println(); } } }
source share