Python: Unicode encoding different from Mac and Ubuntu

While I'm developing a WAS server using tornado 3.2.2,

I ran into a unicode problem after I changed my system from Mac to Ubuntu.

On a Mac, it works great.

However, with the same database (remote MySQL server), the same source code, it shows different

result under ubuntu.

The only thing that differs between the two is the operating machines (mac and ubuntu 14.04)

and python version (mac: 2.7.8, ubuntu: 2.7.6)

On Mac, it shows the correct result as shown below

"remark": "30\uc77c \uc774\uc6a9\uad8c"

But in ubuntu it looks like this

"remark": "30? ???"

I try to do everything that I find on the Internet in 2 days.

But I can’t find why.

/, , :

print(type(test_dict["remark"]))
print(test_dict["remark"].encode("utf-8").decode("euc-kr"))
print(test_dict["remark"].decode("utf-8").encode("euc-kr"))
print(test_dict["remark"].encode("euc-kr").decode("utf-8"))
print(test_dict["remark"].decode("euc-kr").encode("utf-8"))
print(unicode(test_dict["remark"], 'utf-8'))
encoding = chardet.detect(test_dict["remark"])
print(encoding)
print(test_dict["remark"].decode("unicode-escape"))
print(unicode(test_dict["remark"], "utf-8"))
print(unicode(test_dict["remark"], "utf-8").decode("utf-8").encode("utf-8"))
print(unicode(test_dict["remark"], "utf-8").encode("utf-8").decode("utf-8"))
for c in test_dict["remark"]:
    if c not in string.ascii_letters:
        print(" not ascii")
    else:
        print("ascii")
print(test_dict["remark"].decode(encoding["encoding"]).encode("utf-8"))
print(test_dict["remark"].encode("utf-8"))
print(test_dict["remark"].decode("utf-8").encode("euc-kr"))
print(unicode(test_dict["remark"].decode("utf-8").encode("utf-8")))

tornado.escape .

.

Ubuntu:

<type 'str'>
30? ???
30? ???
30? ???
30? ???
30? ???
{'confidence': 1.0, 'encoding': 'ascii'}
30? ???
30? ???
30? ???
30? ???
 not ascii
 not ascii
 not ascii
 not ascii
 not ascii
 not ascii
 not ascii
30? ???
30? ???
30? ???
30? ???

euc-kr

Mac

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

Ubuntu

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

- , ...

,

encoding = chardet.detect(test_dict["remark"])

Mac

{'confidence': 0.938125, 'encoding': 'utf-8'}

Ubuntu

{'confidence': 1.0, 'encoding': 'ascii'}

- , ?

.

+4
1

:

Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
>>> print u"30\uc77c \uc774\uc6a9\uad8c"
30일 이용권

, , , - , UTF-8.

, OS X "30\uc77c\uc774\uc6a9\uad8c", , ( ( )).

0

Source: https://habr.com/ru/post/1548719/


All Articles