Various UTF-16 encodings in Java and C #

I struggle with different results when converting a string to bytes in C # or Java.

FROM#

byte[] byteArray =  Encoding.Unicode.GetBytes ("chess ¾");
for (int i = 0; i < byteArray.Length; i++)
    System.Diagnostics.Debug.Write (" " + byteArray[i]);
System.Diagnostics.Debug.WriteLine("");
System.Diagnostics.Debug.WriteLine(Encoding.Unicode.GetString(byteArray));

displays:

99 0 104 0 101 0 115 0 115 0 32 0 190 0
chess ¾

Java:

byte[] byteArray = "chess ¾".getBytes("UTF-16LE");
for (int i = 0; i < byteArray.length; i++)
        System.out.print(" " + (byteArray[i]<0?(-byteArray[i]+128):byteArray[i]));
System.out.println("");
System.out.println(new String(byteAppName,"UTF-16LE"));

displays:

99 0 104 0 101 0 115 0 115 0 32 0 194 0
chess ¾

Note that the second in the last value in the byte array is different! My goal is to encrypt this data and be able to read it from C # or Java. This difference seems to be an obstacle.

As a side note, before I learned to use Unicode (C #) / UTF-16LE (Java), I used UTF-8 ...

FROM#: byte[] byteArray = Encoding.UTF8.GetBytes ("chess ¾");

displays: 99 104 101 115 115 32 194 190

Java: byteArray = appName.getBytes("UTF-8");

displays: 99 104 101 115 115 32 190 194

This, oddly enough, leads to the inversion of the second and last and last bytes.

, Unicode ¾ 190 (http://www.fileformat.info/info/unicode/char/BE/index.htm), 194 (Â) (http://www.fileformat.info/info/unicode/char/00c2/index.htm).

.

+4
2

, , , byteArray[i] < 0 ? (-byteArray[i] + 128) : byteArray[i], , - byteArray[i] & 0xFF. , poc:

    String encoding = "UTF-16LE";
    byte[] byteArray = "chess ¾".getBytes(encoding);
    for (int i = 0; i < byteArray.length; i++) {
        // your conversion
        System.out.print(" " + (byteArray[i] < 0 ? (-byteArray[i] + 128) : byteArray[i]));
       // a more appropriate one
        System.out.print("(" + (byteArray[i] & 0xFF) + ") ");
    }
    System.out.println("");
    System.out.println(new String(byteArray, encoding));
+4

.

UTF-16LE , 2 4 .

3/4. 190, 194 (11000010 10111110) - , , , -, "VULGAR FRACTION THREE QUARTERS".

byte[], 1 , , . , # 194, Java 190.

- . . .

Java getBytes ( "UTF-16" ) .

# System.Text.Encoding.Unicode.GetBytes .

, Java, getBytes("UTF-16LE") little-endian this, .

.

, Java. , .

+1

Source: https://habr.com/ru/post/1619122/


All Articles