Parse HTTP Header URL Values: Quoting, RFC 5987, MIME, etc.

What bothers me is decoding HTTP header values.

Example Header:
Some-Header: "quoted string?"; *utf-8'en'Weirdness

Is it possible to specify a title value ? What about encoding the " ? ' Itself as a valid quotation mark? What is the semicolon ( ; ) value? Can the parser for the HTTP header be considered a MIME parser?

I am making a transparent proxy server that should transparently process and modify many header fields in-wild. That is why I need to describe the format in detail.

+6
source share
1 answer

Can header values ​​be specified?

If you mean that the production of the RFC 5987 parameter refers to the main part of the header value, then no.

 Some-Header: "foo"; bar*=utf-8'en'bof 

Here, the main part of the header value is likely to be "foo" , including quotation marks, but ...

What does the semicolon (;) have?

The specific processing is determined for each named header separately. Thus, the semicolon matters, for example, Content-Disposition , but not for Content-Length .

Obviously, this is not a very satisfactory solution, but what we are stuck with.

I am making a transparent proxy server that should transparently handle and modify many nested header fields.

You cannot process them in a general way; you must know the shape of each possible header. In everything that you do not recognize, do not try to expand the meaning of the header; and in fact, there is so little RFC 5987 support at the moment, it is unlikely that you can do a lot of useful processing.

The status quo today is that non-ASCII characters in the header values ​​do not work well enough to use the browser at all, both encoded and raw.

Fortunately, they are rarely needed. The only really common use case is for non-ASCII file names for Content-Disposition , but this simplifies the work by putting the file name at the end of the URL path.

Can a value parser for an HTTP header be considered a MIME parser?

Not. HTTP is heavily dependent on MIME and the RFC 822 family as a whole, but it is not part of the 822 family. It has its own low-level grammar for headers, which looks like 822 but is not entirely compatible. Arbitrary MIME functions cannot be used in HTTP, there must be a standardization mechanism to drag them into HTTP explicitly - this is RFC 5987 for (parts of) RFC 2231.

(See section 19.4 of RFC 2616 for a discussion of some other differences.)

In theory, the representation of the multipart form is part of the 822 family, and you should be able to use RFC 2231 encoding. But reality also does not support browsers.

+8
source

Source: https://habr.com/ru/post/904343/


All Articles