Can jar headers not be converted to Unicode?

I am developing a small web service in python using:

  • Flask (v. 0.8)
  • storm ORM (v. 0.19)
  • Apache with mod_wsgi

I have my own HTTP header, Unison-UUID , which I use at some point to retrieve information in my database.

here is a (slightly rewritten for simplicity) snippet that I am having problems with:

 uuid = flask.request.headers['Unison-UUID'] store = storm.locals.Store(my_database) user = store.get(models.User, uuid) 

The User class is something like this:

 class User(Storm): uuid = Unicode(primary=True) # Other columns.... 

The above code does not work as follows:

  File "/Users/lum/Documents/unison-recsys/www/api/unison/unison.py", line 27, in decorated user = g.store.get(models.User, uuid) File "/Users/lum/Documents/unison-recsys/venv/lib/python2.6/site-packages/storm/store.py", line 165, in get variable = column.variable_factory(value=variable) File "/Users/lum/Documents/unison-recsys/venv/lib/python2.6/site-packages/storm/variables.py", line 396, in parse_set % (type(value), value)) TypeError: Expected unicode, found <type 'str'>: '00000000-0000-0000-0000-000000000009' 

I really donโ€™t understand why this is happening and what I can do about it. I thought Flask was 100% unicode .

The quick fix I found is to decode the header value, i.e. uuid = uuid.decode('utf-8') . Is this really what needs to be done? It seems a bit hacky. Can't get unicode directly without having to "decode" it manually?

+6
source share
2 answers

At http://flask.pocoo.org/docs/api/#flask.request we read

The request object is an instance of the Request subclass and provides all the attributes the Werkzeug defines.

The word Request refers to http://werkzeug.pocoo.org/docs/wrappers/#werkzeug.wrappers.Request , where we read

A subclass of the Request and Response BaseRequest and BaseResponse and implement all Werkzeug BaseResponse provides:

The word BaseRequest refers to http://werkzeug.pocoo.org/docs/wrappers/#werkzeug.wrappers.BaseRequest , where we read

headers
WSGI headers surround as immutable EnvironHeaders .

The word EnvironHeaders refers to http://werkzeug.pocoo.org/docs/datastructures/#werkzeug.datastructures.EnvironHeaders , where we read

This provides the same interface as the headers and is created from the WSGI environment.

The word Headers ... no, it is not related, but should be related to http://werkzeug.pocoo.org/docs/datastructures/#werkzeug.datastructures.Headers , where we read

Headers are mostly compatible with the Python class wsgiref.headers.Headers

where the phrase wsgiref.headers.Headers refers to http://docs.python.org/dev/library/wsgiref.html#wsgiref.headers.Headers , where we read

Create header headers for matching-like objects, which should be a header name / value list, as described in PEP 3333 .

The phrase PEP 3333 refers to http://www.python.org/dev/peps/pep-3333/ , where there is no explicit definition of what headings should be, but after searching for the headings of words for a while we find this statement

Therefore, WSGI defines two types of "strings":

 "Native" strings (which are always implemented using the type named str) that are used for request/response headers and metadata "Bytestrings" (which are implemented using the `bytes` type in Python 3, and `str` elsewhere), that are used for the bodies of requests and responses (eg POST/PUT input data and HTML page outputs). 

This is why in Python 2 you get headers like str not unicode .

Now let's move on to decryption.

Neither your .decode('utf-8') nor mensi .decode('ascii') (and blindly awaiting any other encoding) is universal, because Theoretically, the values โ€‹โ€‹of the HTTP header fields can pass anything; the difficult part is getting all parties (senders, recipients and intermediaries) to agree on the encoding. . Having said that, I think that you should act according to the recommendation of Julian Reshka

Thus, a safe way to do this is to adhere to ASCII and select the encoding on top of what is defined in RFC 5987.

after verifying that the user agents (browsers) that you support have implemented it.

The name RFC 5987 is the character set and language encoding for the parameters of the hypertext protocol (HTTP) header field

+12
source

Header values โ€‹โ€‹are ASCII, see related Acorn questions.

Here you can either decode it manually, just like you (although you should use uuid.decode('ascii') rather than utf-8) or change your field as RawStr instead of Unicode

0
source

Source: https://habr.com/ru/post/913038/


All Articles