This is a misleading error report that comes from how python handles the de / encoding process. You tried again to decode the already decoded String, and this confuses the Python function, which repeats, confusing you one by one! ;-) The encoding / decoding process occurs, as far as I know, by the codec module. And somewhere lies the start of this misleading exception message.
You can check yourself: either
u'\x80'.encode('ascii')
or
u'\x80'.decode('ascii')
throws a Unicode Encode error, where
u'\x80'.encode('utf8')
will not be but
u'\x80'.decode('utf8')
will be again!
I assume that you are confusing the meaning of encoding and decoding. Simply put:
decode encode ByteString (ascii) --------> UNICODE ---------> ByteString (utf8) codec codec
But why is there a codec
argument for the decode
method? Well, the main function cannot guess which codec was encoded bytestring, since codec
is required as an argument. If not specified, it is assumed that you are implying that sys.getdefaultencoding()
implicitly used.
so when you use c.decode('ascii')
you a) have a (encoded) ByteString (which is why you use decoding). b) you want to get a unicode-presentation-object (which is what decoding is used for), and c) the codec in which the ByteString is encoded is ascii.
See also: fooobar.com/questions/11845 / ...
http://docs.python.org/howto/unicode.html
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
http://www.stereoplex.com/blog/python-unicode-and-unicodedecodeerror
source share