What is the reason that Encoding.UTF8.GetString and Encoding.UTF8.GetBytes are not mutually opposed?

Maybe I missed something, but I don’t understand why Encoding.UTF8.GetString and Encoding.UTF8.GetBytes do not work as inverse transformation of each other?

In the following example, myOriginalBytes and asBytes are not equal, even their length is different. Can someone explain what I am missing?

byte[] myOriginalBytes = GetRandomByteArray();
var asString = Encoding.UTF8.GetString(myOriginalBytes);
var asBytes = Encoding.UTF8.GetBytes(asString);
+4
source share
1 answer

They are inversions if you start with a valid UTF-8 byte sequence, but this is not the case if you only start with an arbitrary byte sequence.

: , 0xff. UTF-8 . , :

byte[] bytes = { 0xff };
string text = Encoding.UTF8.GetString(bytes);

... text , U + FFFD, Unicode, , . - , 0x80, . , , .

, Encoding - Convert.ToBase64String , , hex. Encoding , .

, :

string text = GetRandomText();
byte[] bytes = Encoding.UTF8.GetBytes(text);
string text2 = Encoding.UTF8.GetString(bytes);

... , text2 text, , , . "" .

+12

Source: https://habr.com/ru/post/1682650/


All Articles