SQLAlchemy / MySQL binary blob encoded by utf-8?

I use SQLAlchemy and MySQL with the files table to store files. This table is defined as follows:

 mysql> show full columns in files; +---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+ | Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment | +---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+ | id | varchar(32) | utf8_general_ci | NO | PRI | NULL | | select,insert,update,references | | | created | datetime | NULL | YES | | NULL | | select,insert,update,references | | | updated | datetime | NULL | YES | | NULL | | select,insert,update,references | | | content | mediumblob | NULL | YES | | NULL | | select,insert,update,references | | | name | varchar(500) | utf8_general_ci | YES | | NULL | | select,insert,update,references | | +---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+ 

A content column of type MEDIUMBLOB is where the files are stored. In SQLAlchemy, this column is declared as:

 __maxsize__ = 12582912 # 12MiB content = Column(LargeBinary(length=__maxsize__)) 

I'm not quite sure of the difference between SQLAlchemy BINARY and LargeBinary . Or the difference between MySQL VARBINARY and BLOB . And I'm not quite sure what it matters here.

Question: Whenever I store the actual binary in this table, i.e. Python bytes or b'' , then I get the following warning

 .../python3.4/site-packages/sqlalchemy/engine/default.py:451: Warning: Invalid utf8 character string: 'BCB121' cursor.execute(statement, parameters) 

I do not want to just ignore the warning, but it seems that the files are in tact. How to handle this warning gracefully, how can I fix its cause?

Lateral note: This question seems to be related, and it seems that this is a MySQL error that is trying to convert all incoming data to UTF-8 ( this answer ).

+5
source share
2 answers

Turns out this is a driver issue. Apparently, the MySQL driver by default is facing Py3 and utf8 support. Installing cymysql in a Python virtual environment resolved this issue and the warnings disappeared.

Correction: Find out if MySQL is connecting through a socket or port (see here ), and then change the connection string accordingly. In my case, using a socket connection:

 mysql+cymysql://user: pwd@localhost /database?unix_socket=/var/run/mysqld/mysqld.sock 

Use the port argument otherwise.

Edit:. While the encoding problem described above, it gave rise to another one: blob size. Due to an error in CyMySQL blobs exceeding 8M cannot be fixed. Switching to PyMySQL fixed this problem, although it seems to have a similar problem with large blobs.

0
source

Not sure, but your problem may have the same roots as I had a few years ago in python 2.7: fooobar.com/questions/1239640 / .... In short, the Mysql interface does not allow you to be sure that you are working with a true binary string or text in binary sort (used due to the lack of case-sensitive utf8 sorting). Therefore, the Mysql binding has the following parameters:

  • return all string fields as binary strings and leave decoding for you
  • decodes only fields that do not have a binary flag (so fun when some of the fields are unicode and others are str )
  • have the ability to force decode to unicode for all string fields, even for true binary

My guess is that in your case, the third option is included in the base Mysql binding. And the first suspect is your connection string (connection parameters).

0
source

Source: https://habr.com/ru/post/1239639/


All Articles