UnicodeEncodeError: "latin-1" codec cannot encode character

What can cause this error when I try to insert an external character into the database?

>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256) 

And how do I solve it?

Thank!

+49
python mysql unicode pylons
Oct 15 '10 at 13:57
source share
8 answers

Character U + 201C The left double quote mark is not encoded in Latin-1 (ISO-8859-1).

It is present on code page 1252 (Western European). This is a Windows encoding based on ISO-8859-1, but which adds extra characters to the range 0x80-0x9F. Code page 1252 is often confused with ISO-8859-1, and it is annoying, but now the standard behavior of the web browser is that if you serve your pages as ISO-8859-1, the browser will treat them as cp1252. However, these are actually two different encodings:

 >>> u'He said \u201CHello\u201D'.encode('iso-8859-1') UnicodeEncodeError >>> u'He said \u201CHello\u201D'.encode('cp1252') 'He said \x93Hello\x94' 

If you only use your database as a byte repository, you can use cp1252 to encode " and other characters that are present on the Windows code page. But still other Unicode characters that are not in cp1252 will cause errors.

You can use encode(..., 'ignore') to suppress errors by getting rid of characters, but in fact this century you should use UTF-8 both in your database and on your pages. This encoding allows you to use any character. Ideally, you should also tell MySQL that you are using UTF-8 rows (by establishing a database connection and sorting by row columns), so it can get case-insensitive comparisons and sortings.

+43
Oct. 15 2018-10-15
source share
β€” -

I ran into this problem when using the Python module MySQLdb. Since MySQL will allow you to store almost any binary data that you want in the text field, regardless of the character set, I found my solution here:

Using UTF8 with Python MySQLdb

Edit: Quote from the above url to satisfy the request in the first comment ...

"UnicodeEncodeError: 'latin-1' codec cannot encode a character ..."

This is because MySQLdb usually tries to encode each of them into Latin-1. This can be fixed by executing the following commands immediately after you have established a connection:

 db.set_character_set('utf8') dbc.execute('SET NAMES utf8;') dbc.execute('SET CHARACTER SET utf8;') dbc.execute('SET character_set_connection=utf8;') 

"db" is the result of MySQLdb.connect() , and "dbc" is the result of db.cursor() .

+65
Aug 21 '12 at 23:28
source share

I hope your database is at least UTF-8. Then you need to run yourstring.encode('utf-8') before trying to put it in the database.

+16
Oct. 15 2018-10-15
source share

The best decision

  • set mysql charset to 'utf-8'
  • follow this comment (add use_unicode=True and charset="utf8" )

    db = MySQLdb.connect (host = "localhost", user = "root", passwd = ", db =" testdb ", use_unicode = True, charset =" utf8 ") - KyungHoon Kim Mar 13 '14 at 17:04

Details see:

 class Connection(_mysql.connection): """MySQL Database Connection Object""" default_cursor = cursors.Cursor def __init__(self, *args, **kwargs): """ Create a connection to the database. It is strongly recommended that you only use keyword parameters. Consult the MySQL C API documentation for more information. host string, host to connect user string, user to connect as passwd string, password to use db string, database to use port integer, TCP/IP port to connect to unix_socket string, location of unix_socket to use conv conversion dictionary, see MySQLdb.converters connect_timeout number of seconds to wait before the connection attempt fails. compress if set, compression is enabled named_pipe if set, a named pipe is used to connect (Windows only) init_command command which is run once the connection is created read_default_file file from which default client values are read read_default_group configuration group to use from the default file cursorclass class object, used to create cursors (keyword only) use_unicode If True, text-like columns are returned as unicode objects using the connection character set. Otherwise, text-like columns are returned as strings. columns are returned as normal strings. Unicode objects will always be encoded to the connection character set regardless of this setting. charset If supplied, the connection character set will be changed to this character set (MySQL-4.1 and newer). This implies use_unicode=True. sql_mode If supplied, the session SQL mode will be changed to this setting (MySQL-4.1 and newer). For more details and legal values, see the MySQL documentation. client_flag integer, flags to use or 0 (see MySQL docs or constants/CLIENTS.py) ssl dictionary or mapping, contains SSL connection parameters; see the MySQL documentation for more details (mysql_ssl_set()). If this is set, and the client does not support SSL, NotSupportedError will be raised. local_infile integer, non-zero enables LOAD LOCAL INFILE; zero disables autocommit If False (default), autocommit is disabled. If True, autocommit is enabled. If None, autocommit isn't set and server default is used. There are a number of undocumented, non-standard methods. See the documentation for the MySQL C API for some hints on what they do. """ 
+6
Nov 28 '16 at 7:05
source share

You are trying to save Unicode \u201c code using ISO-8859-1 / Latin-1 encoding, which cannot describe this code. You may need to modify the database to use utf-8 and save the string data using the appropriate encoding, or you may want to sanitize your entries before storing the contents; i.e. using something like Sam Ruby's excellent i18n tutorial . This talks about the problems that windows-1252 can cause, and suggests how to handle it, as well as links to sample code!

+3
Oct 15 2018-10-15
source share

SQLAlchemy users can simply specify their field as convert_unicode=True .

Example: sqlalchemy.String(1000, convert_unicode=True)

SQLAlchemy will simply accept unicode objects and return them back, processing the encoding itself.

Docs

+2
Jun 14 '17 at 23:21
source share

Latin-1 (aka ISO 8859-1 ) is one octet character encoding scheme, and you cannot \u201c ( " ) in bytes.

Did you want to use UTF-8 encoding?

+1
Oct 15 2018-10-15
source share

Python: you will need to add # - * - encoding: UTF-8 - * - (remove the spaces around *) in the first line of the python file. and then add the following to the encoding text: .encode ('ascii', 'xmlcharrefreplace') . This will replace all Unicode characters with the ASCII equivalent.

-2
Apr 11 '13 at
source share



All Articles