While I'm developing a WAS server using tornado 3.2.2,
I ran into a unicode problem after I changed my system from Mac to Ubuntu.
On a Mac, it works great.
However, with the same database (remote MySQL server), the same source code, it shows different
result under ubuntu.
The only thing that differs between the two is the operating machines (mac and ubuntu 14.04)
and python version (mac: 2.7.8, ubuntu: 2.7.6)
On Mac, it shows the correct result as shown below
"remark": "30\uc77c \uc774\uc6a9\uad8c"
But in ubuntu it looks like this
"remark": "30? ???"
I try to do everything that I find on the Internet in 2 days.
But I can’t find why.
/, , :
print(type(test_dict["remark"]))
print(test_dict["remark"].encode("utf-8").decode("euc-kr"))
print(test_dict["remark"].decode("utf-8").encode("euc-kr"))
print(test_dict["remark"].encode("euc-kr").decode("utf-8"))
print(test_dict["remark"].decode("euc-kr").encode("utf-8"))
print(unicode(test_dict["remark"], 'utf-8'))
encoding = chardet.detect(test_dict["remark"])
print(encoding)
print(test_dict["remark"].decode("unicode-escape"))
print(unicode(test_dict["remark"], "utf-8"))
print(unicode(test_dict["remark"], "utf-8").decode("utf-8").encode("utf-8"))
print(unicode(test_dict["remark"], "utf-8").encode("utf-8").decode("utf-8"))
for c in test_dict["remark"]:
if c not in string.ascii_letters:
print(" not ascii")
else:
print("ascii")
print(test_dict["remark"].decode(encoding["encoding"]).encode("utf-8"))
print(test_dict["remark"].encode("utf-8"))
print(test_dict["remark"].decode("utf-8").encode("euc-kr"))
print(unicode(test_dict["remark"].decode("utf-8").encode("utf-8")))
tornado.escape .
.
Ubuntu:
<type 'str'>
30? ???
30? ???
30? ???
30? ???
30? ???
{'confidence': 1.0, 'encoding': 'ascii'}
30? ???
30? ???
30? ???
30? ???
not ascii
not ascii
not ascii
not ascii
not ascii
not ascii
not ascii
30? ???
30? ???
30? ???
30? ???
euc-kr
Mac
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
Ubuntu
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
- , ...
,
encoding = chardet.detect(test_dict["remark"])
Mac
{'confidence': 0.938125, 'encoding': 'utf-8'}
Ubuntu
{'confidence': 1.0, 'encoding': 'ascii'}
- , ?
.