Python 3.x Requests Unicode Redirection

I am trying to get the following URL with requests.get()in Python 3.x: http://www.finanzen.net/suchergebnis.asp?strSuchString=DE0005933931 (this URL consists of a base URL with a search string DE0005933931).

The request is redirected (via HTTP status code 301) to http://www.finanzen.net/etf/ishares_core_dax%AE_ucits_etf_de in the browser (containing the 0xAE character in the URL). Using requests.get()with a redirected URL also works.

When you try to get the URL of the search string with Python 2.7 all the work, and I get redirected response, using the Python 3.x . I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 21: invalid start byte

This piece of code to verify this:

import requests

url_1 = 'http://www.finanzen.net/suchergebnis.asp?strSuchString=LU0274208692'
# redirected to http://www.finanzen.net/etf/db_x-trackers_msci_world_index_ucits_etf_1c
url_2 = 'http://www.finanzen.net/suchergebnis.asp?strSuchString=DE0005933931'
# redirected to http://www.finanzen.net/etf/ishares_core_dax%AE_ucits_etf_de

print(requests.get(url_1).status_code)  # working
print(requests.get(url_2).status_code)  # error with Python 3.x

Additional Information:

  • I am working on Windows 7 using Python 3.6.3 with requests.__version__ = '2.18.4', but I get the same error with other versions of Python (3.4, 3.5).
  • Using other search strings, everything works with Python 3.x, for example http://www.finanzen.net/suchergebnis.asp?strSuchString=LU0274208692
  • Interestingly, I even got it Internal Server Errorfrom https://www.hurl.it while trying to GET the above URL. Perhaps this is not a Python issue.

Any idea why this works in Python 2.7 but not in Python 3.x and what can I do with it?

+4
source share
1 answer

URL, -1, URL; ASCII 0x?? hex escapes:

Location: /etf/ishares_core_dax0xAE_ucits_etf_de

0xAE URL; . , , -

Location: /etf/ishares_core_dax%AE_ucits_etf_de

Location: /etf/ishares_core_dax%C2%AE_ucits_etf_de

URL- Latin-1 UTF-8.

requests , Location :

from requests.sessions import SessionRedirectMixin

def get_redirect_target(
        self, resp, _orig=SessionRedirectMixin.get_redirect_target):
    try:
        return _orig(self, resp)
    except UnicodeDecodeError:
        return resp.headers['location']

SessionRedirectMixin.get_redirect_target = get_redirect_target

, .

I , .

+3

Source: https://habr.com/ru/post/1688735/


All Articles