Handling char as a byte in Java, different results

Question

Handling char as a byte in Java, different results

Why are the following two results different?

bsh % System.out.println((byte)'\u0080');
-128

bsh % System.out.println("\u0080".getBytes()[0]);
63

Thank you for your responses.

+3

java char binary unicode byte

art1go Feb 14 '11 at 9:00

source share

5 answers

U+0080 <control> ? ( ASCII 0x3F = 63), getBytes().

+3

axtavt 14 . '11 9:05

2 - , unicode 1 .

[-62, -128]. , - UTF-8. getBytes() .

+2

Bozho 14 . '11 9:04

When you have a character that does not support character encoding, it turns into '?' which is 63 in ASCII.

to try

System.out.println(Arrays.toString("\u0080".getBytes("UTF-8")));

prints

[-62, -128]

+1

Peter Lawrey Feb 14 '11 at 9:08

source share

Actually, if you want to get the same result with the call toString(), specify UTF-16_LEas the encoding of the encoding:

bsh %  System.out.println("\u0080".getBytes("UTF-16LE")[0]); 
-128

Java strings are encoded internally as UTF-16, and since we need the low byte, as for the char → byte, we use a little end here. Big endian also works if we change the index of the array:

bsh %  System.out.println("\u0080".getBytes("UTF-16BE")[1]);
-128

0

Paŭlo Ebermann Feb 14 '11 at 10:36

source share

Michael Borgwardt · Accepted Answer · 2011-02-14T09:13:00+0000

(byte)'\u0080'just accepts the numeric value of the code point, which does not fit in byteand therefore is subject to narrowing of the primitive conversion , which discards bits that do not fit in bytes, and, since the highest bit is set, gives a negative number.

"\u0080".getBytes()[0] ( getBytes(), ). , U + 0080 "?" ( U + 003F, 63).

Handling char as a byte in Java, different results

More articles: