Unicode Hell in Pyramid: MySQL & # 8594; SQLAlchemy & # 8594; Pyramid & # 8594; Json

Background

I have a real mess with Unicode and Python. This seems like a usual longing, and I tried using other solutions there, but I just can't get around it.

Customization

MySQL database setup

  • collation_database: utf8_general_ci
  • character_set_database: utf8

SQLAlchemy Model

class Product(Base): id = Column('product_id', Integer, primary_key=True) name = Column('product_name', String(64)) #Tried using Unicode() but didn't help 

Pyramid view

 @view_config(renderer='json', route_name='products_search') def products_search(request): json_products = [] term = "%%%s%%" % request.params['term'] products = dbsession.query(Product).filter(Product.name.like(term)).all() for prod in products: json_prod = {'id': prod.id, 'label': prod.name, 'value': prod.name, 'sku': prod.sku, 'price': str(prod.price[0].price)} json_products.append(json_prod) return json_products 

Problem

I get coding errors reported from the json module (which is called its renderer for this route) as follows:

 UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 37: invalid start byte 

Challenger - “-” symbol (hatch symbol) in the value prod.name. Full stack trace here . If the returned products do not have a "-", then everything works fine!

I tried

I tried coding, decoding various types before returning the json_products variable.

+6
source share
1 answer

the comment above is correct, but more specifically, you can replace 'label': prod.name with 'label': prod.name.decode("cp1252") . You should probably also do this for all lines in the json_prod dictionary, since you will probably see cp1252 encoded characters elsewhere in the actual use of your application.

In this note, depending on the source of these lines and how widely this source is used in your application, you may encounter such problems elsewhere in your application and, as a rule, when you least expect it. For further study, you may need to understand what the source of these strings is, and if you can decode / re-encode at a lower level to fix most future problems with this.

+6
source

Source: https://habr.com/ru/post/895532/


All Articles