Handling char as a byte in Java, different results

Why are the following two results different?

bsh % System.out.println((byte)'\u0080');
-128

bsh % System.out.println("\u0080".getBytes()[0]);
63

Thank you for your responses.

+3
source share
5 answers

(byte)'\u0080'just accepts the numeric value of the code point, which does not fit in byteand therefore is subject to narrowing of the primitive conversion , which discards bits that do not fit in bytes, and, since the highest bit is set, gives a negative number.

"\u0080".getBytes()[0] ( getBytes(), ). , U + 0080 "?" ( U + 003F, 63).

+5

U+0080 <control> ? ( ASCII 0x3F = 63), getBytes().

+3

2 - , unicode 1 .

[-62, -128]. , - UTF-8. getBytes() .

+2

When you have a character that does not support character encoding, it turns into '?' which is 63 in ASCII.

to try

System.out.println(Arrays.toString("\u0080".getBytes("UTF-8")));

prints

[-62, -128]
+1
source

Actually, if you want to get the same result with the call toString(), specify UTF-16_LEas the encoding of the encoding:

bsh %  System.out.println("\u0080".getBytes("UTF-16LE")[0]); 
-128

Java strings are encoded internally as UTF-16, and since we need the low byte, as for the char → byte, we use a little end here. Big endian also works if we change the index of the array:

bsh %  System.out.println("\u0080".getBytes("UTF-16BE")[1]);
-128
0
source

Source: https://habr.com/ru/post/1791942/


All Articles