Given the following code executed by the Python interpreter:
import sys
sys.getdefaultencoding()
my_string = '\xc3\xa9'
my_string = unicode(my_string, 'utf-8')
my_string
print my_string
With Python 2.6.1 running on a Mac, everything works fine:
$ python
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> my_string = '\xc3\xa9'
>>> my_string = unicode(my_string, 'utf-8')
>>> my_string
u'\xe9'
>>> print my_string
Γ©
>>>
When Python 2.6.5 runs on Ubuntu 10.04 LTS, it fails:
$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> my_string = '\xc3\xa9'
>>> my_string = unicode(my_string, 'utf-8')
>>> my_string
u'\xe9'
>>> print my_string
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
>>>
Has something changed between Python 2.6.1 and 2.6.5 that requires different string handling in unicode? Or is it due to something incorrectly configured in my (default Ubuntu server 10.04 LTS) Linux environment?
Edit: both environments have LANG = en_US.UTF-8
source
share