UnicodeDecodeError mechanization issue

I get the following line from one site using mechanization:

'We\x92ve' 

I know that \ x92 stands for the character . I am trying to convert this string to Unicode:

 >> unicode('We\x92ve','utf-8') UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 2: unexpected code byte 

What am I doing wrong?

Edit: the reason I tried to use "utf-8" was:

 >> response = browser.response() >> response.info()['content-type'] 'text/html; charset=utf-8' 

Now I see that I can not always trust the content header.

+3
source share
1 answer

\x92 means ' OK, but it does it in Windows-1252 encoding, not in UTF-8:

 >>> print unicode('We\x92ve','1252') We've 

If you do not know what encodes the source data, you can detect it with chardet (extremely easy to use).

+4
source

Source: https://habr.com/ru/post/1301982/


All Articles