ToByte string and reverse, not bijective if bytes changed

The following code changes each byte of a line and creates a new line.

public static String convert(String s) { byte[] bytes = s.getBytes(); byte[] convert = new byte[bytes.length]; for (int i = 0; i < bytes.length; i++) { convert[i] = (byte) ~bytes[i]; } return new String(convert); } 

Question: Why is convert () not bijective?

 convert(convert("Test String")).equals("Test String") === false 
+4
source share
2 answers

when you use the String constructor (byte []), it does not necessarily take one letter for each byte, it takes the default encoding; if this is, say, UTF-8, then the constructor will try to decode some characters from two or three bytes, and not just from one.

Since you use bit padding to convert byte to byte, the result may be different if you use the default encoding.

If you use only ASCII characters, you can try this version of your function:

 // ONLY if you use ASCII as Charset public static String convert(String s) { Charset ASCII = Charset.forName("ASCII"); byte[] bytes = s.getBytes(ASCII); byte[] convert = new byte[bytes.length]; for (int i = 0; i < bytes.length; i++) { convert[i] = (byte) (~bytes[i] & 0x7F); } return new String(convert, ASCII); } 
+3
source

Since information is lost when converting a managed byte to String and vice versa. In this line below for (int i = 0; i <bytes.length; i ++) {convert [i] = (byte) ~ bytes [i]; }

 return new String(convert); 

If you go into the implementation of the String to byte transformation and vice versa, you will find that CharSet and coding are involved. Read about them and you will get a detailed explanation of this behavior.

0
source

Source: https://habr.com/ru/post/1492439/


All Articles