It turns out the problem is rather inconvenient. In short, most variations and views in MySQL data strings are mapped to a single data type in the MySQL interface with the optional BINARY flag.
Thus, MySQL VARCHAR , VARBINARY and the string literal are mapped to the same type of MySQLdb.constants.FIELD_TYPE.VAR_STRING in the column type definitions, but with the optional MySQLdb.constants.FLAG.BINARY flag when the type is VARBINARY or the string is mapped to *_bin .
Even if there is a type MySQLdb.constants.FIELD_TYPE.VARCHAR , I could not find out when it is being used. As I said, MySQL VARCHAR columns VARCHAR mapped to FIELD_TYPE.VAR_STRING .
The solution becomes quite fragile if your application uses true binary strings (for example, you store images and extract them with the same connection as the text), since it involves decoding all binary strings in unicode. Although it works.
As official docs it says:
Because MySQL returns all the data as strings and expects you to convert it yourself. That would be a real pain in the ass, but in fact, _mysql can do it for you. (And MySQLdb does this for you.) To perform automatic type conversion, you need to create a dictionary of type converters and pass this to connect () as the parameter for the conv keyword.
In practice, real pain in the ass can be the process of creating your own dictionary of converters. But you can import the default value from MySQLdb.converters.conversions and fix it, or even fix it in the Connection instance. The trick is to remove the special converter for the FLAG.BINARY flag and add a decoder for all cases. If you explicitly specified the charset parameter for MySQLdb.connect , it forces the use_unicode=1 parameter, which adds a decoder for you, but you can do it yourself:
>>> con = MySQLdb.connect(**params) >>> con.converter[FIELD_TYPE.VAR_STRING] [(128, <type 'str'>), (None, <function string_decoder at 0x01FFA130>)] >>> con.converter[FIELD_TYPE.VAR_STRING] = [(None, con.string_decoder)] >>> c = con.cursor() >>> c.execute("SELECT %s COLLATE utf8_bin", u'') 1L >>> c.fetchone() (u'\u043c',)
You may need to do the same hack for FIELD_TYPE.STRING if necessary.
Another solution is to pass explicit use_unicode=0 to MySQLdb.connect and create all the decodes in your code, but I wouldn’t.
Hope this can be helpful to someone.