How to encode ('ascii', 'ignore') to throw a UnicodeDecodeError?

This line

data = get_url_contents(r[0]).encode('ascii', 'ignore') 

creates this error

 UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 11450: ordinal not in range(128) 

Why? I suggested that because I use β€œignore”, it would be impossible to have decoding errors while saving the output for the value for the string variable.

+6
source share
1 answer

Due to the quirk of Python 2, you can call encode on a byte string (i.e., already encoded text). In this case, it first tries to convert it to a unicode object by decoding with ascii. So, if get_url_contents returns a byte string, your string effectively does this:

 get_url_contents(r[0]).decode('ascii').encode('ascii', 'ignore') 

In Python 3, byte strings do not have an encode method, so the same problem just raises an AttributeError.

(Of course, I do not know that this is a problem - this may be due to the get_url_contents function. But what I described above is my best guess)

+3
source

Source: https://habr.com/ru/post/898463/


All Articles