What is the reason that Encoding.UTF8.GetString and Encoding.UTF8.GetBytes are not mutually opposed?

Question

What is the reason that Encoding.UTF8.GetString and Encoding.UTF8.GetBytes are not mutually opposed?

Maybe I missed something, but I don’t understand why Encoding.UTF8.GetString and Encoding.UTF8.GetBytes do not work as inverse transformation of each other?

In the following example, myOriginalBytes and asBytes are not equal, even their length is different. Can someone explain what I am missing?

byte[] myOriginalBytes = GetRandomByteArray();
var asString = Encoding.UTF8.GetString(myOriginalBytes);
var asBytes = Encoding.UTF8.GetBytes(asString);

+4

c # .net utf-8

g.pickardou Jul 31 '17 at 7:53

source share

1 answer

Jon Skeet · Accepted Answer · 2017-07-31T07:55:37+0000

They are inversions if you start with a valid UTF-8 byte sequence, but this is not the case if you only start with an arbitrary byte sequence.

: , 0xff. UTF-8 . , :

byte[] bytes = { 0xff };
string text = Encoding.UTF8.GetString(bytes);

... text , U + FFFD, Unicode, , . - , 0x80, . , , .

, Encoding - Convert.ToBase64String , , hex. Encoding , .

, :

string text = GetRandomText();
byte[] bytes = Encoding.UTF8.GetBytes(text);
string text2 = Encoding.UTF8.GetString(bytes);

... , text2 text, , , . "" .

What is the reason that Encoding.UTF8.GetString and Encoding.UTF8.GetBytes are not mutually opposed?

More articles: