A System.Net.WebClient request gets 403 Forbidden, but browsers do not support Apache servers

Odd, I'm trying to read the <Head> section of many different websites, and one type of Apache server sometimes gives 403 code as forbidden. Not all apache servers do this, so it may be a configuration setting or a specific version of the server.

When I then check the URL using a web browser (like Firefox), the page loads fine. Code sorting looks like this:

var client = new WebClient(); var stream = client.OpenRead(new Uri("http://en.wikipedia.org/wiki/Barack_Obama")); 

Typically, 403 is access permission, but these are usually unprotected pages. I think Apache is filtering something in the request headers, since I'm not going to create any.

Perhaps someone who knows more about Apache can give me some ideas on what is missing in the headers. I would like the headers to be as small as possible to minimize bandwidth.

thanks

+5
source share
4 answers

Try setting the UserAgent header:

 string _UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)"; client.Headers.Add(HttpRequestHeader.UserAgent, _UserAgent); 
+10
source

I had a similar problem and the setup was resolved below.

 Client.Headers["Accept"] = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*"; Client.Headers["User-Agent"] ="Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDC)"; 
+4
source

This might be a UserAgent header question, as "thedugas" said, or actually everything the browser hasn't configured. For example, you may be asked not to use a proxy server that uses a browser, or not to use the correct credentials for the proxy server. These are the things that can already be configured in the browser, so you do not know what they should be done.

+1
source

I had the same problem and the answer was not obvious. I found a solution that sniffs network communications. When Apache provides its page "Testing 1 2 3 ...", it returns HTML with a prohibition code of 403. The browser ignores, receives the code and displays the page, but de WebClient returns an error message. The solution is to read the answer inside the Try Trick instruction. Here is my code:

  Dim Retorno As String = "" Dim Client As New SiteWebClient Client.Headers.Add("User-Agent", "Mozilla/ 5.0(Windows NT 10.0; Win64; x64) AppleWebKit/537.36 " & "(KHTML, Like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134") Client.Headers.Add("Accept-Language", "pt-BR, pt;q=0.5") Client.Headers.Add("Accept", "Text/ html, application / xhtml + Xml, application / Xml;q=0.9,*/*;q=0.8") Try Retorno = Client.DownloadString("http://" & HostName & SitePath) Catch ex As Exception If ex.GetType = GetType(System.Net.WebException) Then Try Dim Exception As System.Net.WebException = ex Dim Resposta As System.Net.HttpWebResponse = Exception.Response Using WebStream As New StreamReader(Resposta.GetResponseStream(), System.Text.Encoding.GetEncoding("utf-8")) Retorno = WebStream.ReadToEnd End Using Catch ex1 As Exception End Try End If End Try 

After verification, Try Retorno will contain the server’s HTML response, regardless of which error code the server returns.

Headings do not affect this behavior.

0
source

Source: https://habr.com/ru/post/1302098/


All Articles