Mysql-python mapping problem: how to force a unicode data type?

For certain purposes, I had to change the field mapping from utf8_unicode_ci to utf8_bin in the database. It turned out that the changes lead to changes in the data types that come in python.

The question is how to make mysql-python return unicode objects in python .

Here is an example that shows the problem (explicit character set use_unicode = 1):

 >>> con = MySQLdb.connect(..., charset='utf8') >>> c = c.cursor() >>> c.execute('SELECT %s COLLATE utf8_bin', u'') 1L >>> c.fetchone() ('\xd0\xbc',) >>> c.description (("'\xd0\xbc' COLLATE utf8_bin", 253, 2, 3, 3, 31, 0),) >>> c.execute('SELECT %s COLLATE utf8_unicode_ci', u'') 1L >>> c.fetchone() (u'\u043c',) >>> c.description (("'\xd0\xbc' COLLATE utf8_unicode_ci", 253, 2, 3, 3, 31, 0),) 

In my database, the fields are of type VARCHAR, but after the change they behave like BINARY, which is not what I want.

+2
source share
2 answers

It turns out the problem is rather inconvenient. In short, most variations and views in MySQL data strings are mapped to a single data type in the MySQL interface with the optional BINARY flag.

Thus, MySQL VARCHAR , VARBINARY and the string literal are mapped to the same type of MySQLdb.constants.FIELD_TYPE.VAR_STRING in the column type definitions, but with the optional MySQLdb.constants.FLAG.BINARY flag when the type is VARBINARY or the string is mapped to *_bin .

Even if there is a type MySQLdb.constants.FIELD_TYPE.VARCHAR , I could not find out when it is being used. As I said, MySQL VARCHAR columns VARCHAR mapped to FIELD_TYPE.VAR_STRING .

The solution becomes quite fragile if your application uses true binary strings (for example, you store images and extract them with the same connection as the text), since it involves decoding all binary strings in unicode. Although it works.

As official docs it says:

Because MySQL returns all the data as strings and expects you to convert it yourself. That would be a real pain in the ass, but in fact, _mysql can do it for you. (And MySQLdb does this for you.) To perform automatic type conversion, you need to create a dictionary of type converters and pass this to connect () as the parameter for the conv keyword.

In practice, real pain in the ass can be the process of creating your own dictionary of converters. But you can import the default value from MySQLdb.converters.conversions and fix it, or even fix it in the Connection instance. The trick is to remove the special converter for the FLAG.BINARY flag and add a decoder for all cases. If you explicitly specified the charset parameter for MySQLdb.connect , it forces the use_unicode=1 parameter, which adds a decoder for you, but you can do it yourself:

 >>> con = MySQLdb.connect(**params) >>> con.converter[FIELD_TYPE.VAR_STRING] [(128, <type 'str'>), (None, <function string_decoder at 0x01FFA130>)] >>> con.converter[FIELD_TYPE.VAR_STRING] = [(None, con.string_decoder)] >>> c = con.cursor() >>> c.execute("SELECT %s COLLATE utf8_bin", u'') 1L >>> c.fetchone() (u'\u043c',) 

You may need to do the same hack for FIELD_TYPE.STRING if necessary.

Another solution is to pass explicit use_unicode=0 to MySQLdb.connect and create all the decodes in your code, but I wouldn’t.

Hope this can be helpful to someone.

+2
source

This is a large number of changes from using Mysql-Python at a low level, but I think it's better to use something like sqlalchemy instead of directly using db-api, you can use, for example, types.Unicode and know that it does what required to support unicode for db-api

Before you jump on me, so as not to answer the question directly, consider this: mysql-python aka MySQLdb is just one of several db-api for MySQL. MySQLdb will probably be supported by new versions, but there are circumstances (for example, switching to python 3x or a host where you cannot install binary modules), which may force you to use something else in the future, such as oursql or myconnpy . The people who make sqlalchemy have spent a lot of effort supporting multiple db-api, and in the case of mysql-python, they even worked on serious bugs in the past. With sqlalchemy, switching to another db-api will be as easy as changing the connection url, and it will ensure that anything with respect to forced data entry is handled as you expected.

However, to use this, you will need to define your tables in terms of sqlalchemy schemas and use their query APIs, but you will get a lot for this.

+1
source

Source: https://habr.com/ru/post/1239640/


All Articles