Odd redirect location causes proxy error with urllib2

I am using urllib2 to execute a request to send http using Python 2.7.3. My request returns an HTTPError exception (HTTP error 502: proxy error).

Looking at the message traffic with Charles, I see that the following happens:

  • I am sending an HTTP request (POST / index.asp? Action = login HTTP / 1.1) using urllib2
  • The remote server responds with a status of 303 and a location header .. /index.asp? action = news
  • urllib2 retries the request for receipt: (GET /../ index.asp? action = news HTTP / 1.1)
  • The remote server responds with a status of 502 (proxy error)

Answer 502 includes this in the response body: "DNS lookup error for: 10.0.0.30:80index.asp" (note the invalid URL)

Therefore, I mean that the proxy server on the remote server network sees the URL "/../index.asp" in the request and misinterprets it, sending my request with a bad URL.

When I make the same request with my browser (Chrome), the replay is sent to GET / index.asp? Action = news. Thus, Chrome removes the leading "/ .." from the URL, and the remote server responds with a valid response.

Is this a urllib2 error? Is there something I can do to make the replay ignore "/ .." in the url? Or is there another way to solve this problem? Thinking it might be a urllib2 error, I changed urllib2 to requests, but the requests gave the same result. Of course, this may be due to the fact that requests are created on urllib2.

Thanks for any help.

+4
source share
1 answer

The place posted with this 302 is incorrect in several ways.

First, if you are reading RFC2616 (HTTP / 1.1 Header Field Definitions) 14.30 Location should be absolute, not relative. Section 10.3.3 clearly states that this is an appropriate definition.

Secondly, even if relative URIs are allowed, RFC 1808 , Relative Uniform Resource Locators, 4. Relative URL resolution, step 6, only sets up special processing for .. in the <segment>/../ template. This means that the relative URL should not start with .. Thus, even if the base URL is http://example.com/foo/bar/ , and the relative URL is ../baz/ , the permitted URL is not http://example.com/foo/baz/ , but http://example.com/foo/bar/../baz . (Of course, most servers will handle them the same way, but it depends on each server.)

Finally, even if you combined the relative and base URLs before resolving .. , an absolute URI with a start starting with .. is not valid.

So, the error is in the server configuration.

Now it so happened that many user agents will work on this error. In particular, they turn /../foo into /foo to block users (or arbitrary JS running on their behalf without their knowledge) from attempting to run away from webroot attacks.

But this does not mean that urllib2 should do this, or that this is a mistake for this. Of course, urllib2 should detect the error before, so that it can tell you an "invalid path" or something like that, instead of combining an invalid absolute URI that the server will confuse to send you unnecessary errors. But it is right to fail.

It’s all good and good to say that the server configuration is incorrect, but if you are not responsible for the server, you are likely to face a difficult battle trying to convince them that their site is corrupted and needs to be fixed when it works with every web the browser they care about. This means that you may need to write your own workaround to work with their site.

The way to do this with urllib2 is to provide your own HTTPRedirectHandler using an implementation of the redirect_request method, which recognizes this case and returns a different Request than the default code (in particular, http://example.com/index.asp?action=news instead http://example.com/../index.asp?action=news ).

+2
source

Source: https://habr.com/ru/post/1445898/


All Articles