How do you store accented characters coming from a web service to a database?

I have the following word that I get through the web service: André

From Python, the meaning is: "Andr \ u00c3 \ u00a9". Then the input is decoded using json.loads :

 >>> import json >>> json.loads('{"name":"Andr\\u00c3\\u00a9"}') >>> {u'name': u'Andr\xc3\xa9'} 

When I store the above in a utf8 MySQL database, the data is stored as follows using Django:

 SomeObject.objects.create(name=u'Andr\xc3\xa9') 

Querying a column of names from the mysql shell or displaying it on a web page gives: André

The web page is displayed in utf8:

 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 

My database is configured in utf8:

 mysql> SHOW VARIABLES LIKE 'collation%'; +----------------------+-----------------+ | Variable_name | Value | +----------------------+-----------------+ | collation_connection | utf8_general_ci | | collation_database | utf8_unicode_ci | | collation_server | utf8_unicode_ci | +----------------------+-----------------+ 3 rows in set (0.00 sec) mysql> SHOW VARIABLES LIKE 'character_set%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec) 

How can I get the word André from a web service, store it correctly in the database without losing data and display it on the web page in its original form?

+4
source share
1 answer

The error is already on the line you are passing json.loads (). \ U00c3 is Tilda and \ 00a9 is the copyright sign. Correct for é would be.

The string was probably encoded in UTF-8 by the sender and decoded as an ISO-8859-1 receiver.

For example, if you run the following Python script:

 # -*- encoding: utf-8 -*- import json data = {'name': u'André'} print('data: {0}'.format(repr(data))) code = json.dumps(data) print('code: {0}'.format(repr(code))) conv = json.loads(code) print('conv: {0}'.format(repr(conv))) name = conv['name'] print(u'Name is {0}'.format(name)) 

The result should look like this:

 data: {'name': u'Andr\xe9'} code: '{"name": "Andr\\u00e9"}' conv: {u'name': u'Andr\xe9'} Name is André 

Managing Unicode in Python 2.x can sometimes be a nuisance. Unfortunately, Django does not yet support Python 3.

+6
source

Source: https://habr.com/ru/post/1308928/


All Articles