Encoding the differences between using WebClient and WebRequest?

Getting some random index of a Spanish newspaper, I don’t get diacritical marks correctly with WebRequest, they give this strange symbol: , when loading a response from the same uri using WebClient I get the corresponding answer.

Why is this differentiation?

 var client = new WebClient(); string html = client.DownloadString(endpoint); 

vs

 WebRequest request = WebRequest.Create(endpoint); using (WebResponse response = request.GetResponse()) { Stream stream = response.GetResponseStream(); StreamReader reader = new StreamReader(stream); string html = reader.ReadToEnd(); } 
+4
source share
1 answer

You simply assume that the object is in UTF-8 when creating your stream reader without explicitly setting the encoding. You should examine the CharacterSet HttpWebResponse (not displayed by the WebResponse base class) and open the StreamReader with the appropriate encoding.

Otherwise, if he reads something that is not UTF-8, as if it were UTF-8, he would encounter octet sequences that are invalid in UTF-8 and should replace the replacement character U + FFFD ( ) as the best he can do.

WebClient does pretty much this: DownloadString is a higher-level method where WebRequest and its derived classes let you go down, it has one call to "send a GET request to the URI to see what content encoding is used, in case you need to disable or disable it, see what character encoding is in place, configure a text reader with this encoding and stream, and then call ReadAll() . " Normal high level instructions - large chunks and low profile junior ones - instructions and cons apply.

+4
source

Source: https://habr.com/ru/post/1393184/


All Articles