UTF8 parsing JSON response from server

Question

UTF8 parsing JSON response from server

I ran into a strange problem while parsing a JSON response from my server. It worked fine over the past months when it received a response (with Content-Type: text / html) as follows:

string response = ""; using (var client = new System.Net.Http.HttpClient()) { var postData = new System.Net.Http.FormUrlEncodedContent(data); var clientResult = await client.PostAsync(url, postData); if(clientResult.IsSuccessStatusCode) { response = await clientResult.Content.ReadAsStringAsync(); } } //Parse the response to a JObject...

But when receiving a response with Content-Type: text / html; charset = utf8 , it throws an exception that the Content-Type is invalid .

Exception message: The character set provided in ContentType is invalid. Cannot read content as string using an invalid character set.

So, I changed this:

 response = await clientResult.Content.ReadAsStringAsync();

:

 var raw_response = await clientResult.Content.ReadAsByteArrayAsync(); response = Encoding.UTF8.GetString(raw_response, 0, raw_response.Length);

Now I can get the answer without any exceptions, but when parsing it, it throws a parsing exception. During debugging, I got this: (I changed the answer to a shorter one for testing)

 var r1 = await clientResult.Content.ReadAsStringAsync(); var r2 = Encoding.UTF8.GetString(await clientResult.Content.ReadAsByteArrayAsync(), 0, raw_response.Length); System.Diagnostics.Debug.WriteLine("Length: {0} - {1}", r1.Length, r1); System.Diagnostics.Debug.WriteLine("Length: {0} - {1}", r2.Length, r2); //Output Length: 38 - {"version":1,"specialword":"C\u00e3o"} Length: 39 - {"version":1,"specialword":"C\u00e3o"}

The JSON response format seems to be correct in both cases, but the length is different, and I could not understand why. When copying this to notepad ++ to detect hidden characters appeared ? .

 Length: 38 - {"version":1,"specialword":"C\u00e3o"} Length: 39 - ?{"version":1,"specialword":"C\u00e3o"}

This one ? explicitly throws a parsing exception, but I don't know why Encoding.UTF8.GetString causes this.

I struggled with this in the last hours, and I really need help.

+4

json http c # .net utf-8

letiagoalves Aug 4 '13 at 12:50

source share

1 answer

Tj crowder · Accepted Answer · 2013-08-04T13:17:45+0000

Well, I'm surprised that you got this behavior, I would expect Encoding.UTF8.GetString handle this for you.

What you see, the character value 0xFEFF , is a byte order byte ("BOM"). The specification is not needed in UTF-8 because the byte order is not variable, but as a marker it is assumed that the following text is encoded in UTF-8. (The actual byte sequence is EF BB BF, but when it is decoded in UTF-8, it becomes the FEFF code point.)

~~If you create your own UTF8Encoding instance , you can specify whether to include or exclude the specification.~~ (I think I'm "I'm wrong about that, he can only control if he is being encoded.)

Alternatively, you can explicitly check this and delete the specification, if any, for example:

 var r2 = Encoding.UTF8.GetString(await clientResult.Content.ReadAsByteArrayAsync(), 0, raw_response.Length); if (r2[0] == '\uFEFF') { r2 = r2.Substring(1); }

UTF8 parsing JSON response from server

More articles: