How can I get the same row from different data bytes in java?

I found a strange problem when converting bytes to a UTF8 string in Java. Why are bytes1 and bytes2 different, but str1 and srt2 are the same?

Here is the test code.

import org.apache.commons.codec.binary.Hex;

public class MyTest {
    public static void main(String[] args) throws Exception {
        byte[] bytes1 = Hex.decodeHex("EDA0BDEDB88A".toCharArray());
        byte[] bytes2 = Hex.decodeHex("F09F988A".toCharArray());

        System.out.println("bytes1 length: " + bytes1.length);
        System.out.println("bytes2 length: " + bytes2.length);

        String str1 = new String(bytes1, "utf8");
        String str2 = new String(bytes2, "utf8");

        System.out.println("str1 is equals str2? " + str1.equals(str2));
    }
}

Here is the result of test code running on jdk7

bytes1 length: 6
bytes2 length: 4
str1 is equals str2? true

Can I find out the relationship between "EDA0BDEDB88A" and "F09F988A"?

'F09F988A' is a unicode smail face , but 'EDA0BDEDB88A' is unknown.

+4
source share
1 answer

Both byte sequences F09F988A and EDA0BDEDB88A are decoded by Java to the same code U + 1F60A (WATCH WITH LEAVES WITH EYES).

EDA0BDEDB88A , Java UTF-8 UTF-8 U + D83D U + DE0A. , , UTF-8, , , CESU-8 ( UTF-16).

0

Source: https://habr.com/ru/post/1527479/


All Articles