Python unicode: why does it work in one machine, but sometimes it didn’t work in another?

Question

Python unicode: why does it work in one machine, but sometimes it didn’t work in another?

I found unicode in python really uncomfortable, why don't Python use utf-8 for all lines? I am in China, so I need to use some Chinese string that cannot represent ascii, I use u'' to denote the string, it works well on my ubuntu machine, but in another ubuntu machine (VPS is provided by linode.com), it suffers fail several times. Error:

UnicodeDecodeError: codec 'ascii' cannot decode byte 0xe9 at position 0: serial number not in range (128)

The code I use is:

 self.talk(user.record["fullname"] + u"准备好了")

+4

python unicode python-2.x

Bin cin Dec 23 '10 at 12:36

source share

3 answers

You need to decode all strings other than Unicode as early as possible. Try to make sure that you do not have UTF-8 bytes stored anywhere in the memory, and that you only have Unicode objects. For example, make sure that user.record elements are all converted to unicode when created, so you won't get any errors like this. Or just use Python 3, where it's hard to mix them.

+1

Rosh oxymoron Dec 23 '10 at 12:55

source share

Because for Python 2.x, the default encoding is ASCII, if it has not changed manually. Here's a rough hack to include in your script before any other code

 import sys reload(sys) sys.setdefaultencoding("utf-8")

This will change the default Python encoding to UTF-8.

0

ismail Dec 23 '10 at 12:48

source share

mouad · Accepted Answer · 2010-12-23T12:58:16+0000

The thing with the famous UnicodeDecodeError is that you do some string manipulation, like the one you just did:

 user.record["fullname"] + u" 准备好了"

because what you are doing is joining str with unicode, so python will implicitly force str to unicode before doing concatenation, this enforcement is done as follows:

 unicode(user.record["fullname"]) + u" 准备好了" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Problem

And there is a problem, because when executing unicode(something) python will decode the string using the default encoding, which is ASCII in python 2. *, and if it happens that your user.record["fullname"] string has some character not -ASCII, it will raise a known UnicodeDecodeError error.

as you can solve it:

 # Decode the str to unicode using the right encoding # here i used utf-8 because mostly is the right one but maybe it not (another problem!!!) a = user.record["fullname"].decode('utf-8') self.talk(a + u" 准备好了")

PS: Now in python 3 the default encoding is utf-8, and one more thing you cannot concatenate unicode with a string (bytes in python 3.), so there is more implicit coercion

Python unicode: why does it work in one machine, but sometimes it didn’t work in another?

More articles: