Python output replaces non-ASCII characters with

I am using Python 2.7 to read data from a MySQL table. In MySQL, the name is as follows:

Garasa, Angel.

But when I print it in Python, the output

Garasa, ngel

The character set name in MySQL is utf8. This is my Python code:

# coding: utf-8

import MySQLdb

connection = MySQLdb.connect     
(host="localhost",user="root",passwd="root",db="jmdb")
cursor = connection.cursor ()
cursor.execute ("select * from actors where actorid=672462;")
data = cursor.fetchall ()
for row in data:
    print  "IMDB Name=",row[4]
    wiki=("".join(row[4]))
    print wiki

I tried to decode it, but I get an error, for example:

UnicodeDecodeError: codec 'utf8' cannot decode byte 0xc1 at position 8: invalid start byte

I read about decoding and UTF-8, but could not find a solution.

+4
source share
2 answers

Get the Mysql driver to return Unicode strings. This means that you do not need to deal with code decryption.

use_unicode=True . , charset.

+1

, cp1252:

>>> s = 'Garasa, Ángel.'
>>> s.decode('utf-8')

Traceback (most recent call last):
  File "<pyshell#63>", line 1, in <module>
    s.decode('utf-8')
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc1 in position 8: invalid start byte

>>> s.decode('cp1252')
u'Garasa, \xc1ngel.'
>>>
>>> print s.decode('cp1252')
Garasa, Ángel.

: , latin-1:

>>> s.decode('latin-1')
u'Garasa, \xc1ngel.'
>>> print s.decode('latin-1')
Garasa, Ángel.

cp1252 latin-1 , 128 159.

(latin-1):

Windows-1252 ISO-8859-1 128 159 ( 80 9F), C1 , , ISO-8859-15

this one (cp1252):

ISO 8859-1, IANA ISO-8859-1, , 80 9F (hex).

+1

Source: https://habr.com/ru/post/1622462/


All Articles