UnicodeDecodeError mechanization issue

Question

I get the following line from one site using mechanization:

'We\x92ve'

I know that \ x92 stands for the character . I am trying to convert this string to Unicode:

 >> unicode('We\x92ve','utf-8') UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 2: unexpected code byte

What am I doing wrong?

Edit: the reason I tried to use "utf-8" was:

 >> response = browser.response() >> response.info()['content-type'] 'text/html; charset=utf-8'

Now I see that I can not always trust the content header.

+3

parxier Feb 21 '10 at 13:24

1 answer

Max shawabkeh · Accepted Answer · 2010-02-21T13:30:57+0000

\x92 means ' OK, but it does it in Windows-1252 encoding, not in UTF-8:

 >>> print unicode('We\x92ve','1252') We've

If you do not know what encodes the source data, you can detect it with chardet (extremely easy to use).