How to encode ('ascii', 'ignore') to throw a UnicodeDecodeError?

Question

How to encode ('ascii', 'ignore') to throw a UnicodeDecodeError?

This line

data = get_url_contents(r[0]).encode('ascii', 'ignore')

creates this error

 UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 11450: ordinal not in range(128)

Why? I suggested that because I use “ignore”, it would be impossible to have decoding errors while saving the output for the value for the string variable.

+6

python unicode-string

Trindaz 01 Oct '11 at 10:56

source share

1 answer

Thomas K · Answer 1 · 2011-10-02T00:35:13+0000

Due to the quirk of Python 2, you can call encode on a byte string (i.e., already encoded text). In this case, it first tries to convert it to a unicode object by decoding with ascii. So, if get_url_contents returns a byte string, your string effectively does this:

 get_url_contents(r[0]).decode('ascii').encode('ascii', 'ignore')

In Python 3, byte strings do not have an encode method, so the same problem just raises an AttributeError.

(Of course, I do not know that this is a problem - this may be due to the get_url_contents function. But what I described above is my best guess)

How to encode ('ascii', 'ignore') to throw a UnicodeDecodeError?

More articles: