Stopwatch as a URL request separator

Although it is strongly recommended ( for the W3C source , via Wikipedia ) for web servers to support the semicolon as a separator of URL request elements (in addition to the ampersand), this is generally not observed.

For example, compare

http://www.google.com/search?q=nemo & oe = utf-8

http://www.google.com/search?q=nemo ; OE = UTF-8

Results. (In the latter case, the semicolon is considered or used during the writing of this text as a regular string character, as if the URL were: http://www.google.com/search?q=nemo% 3B oe = utf-8 )

Although the first URL parsing library I tried behaves well:

>>> from urlparse import urlparse, query_qs >>> url = 'http://www.google.com/search?q=nemo;oe=utf-8' >>> parse_qs(urlparse(url).query) {'q': ['nemo'], 'oe': ['utf-8']} 

What is the current state of accepting a semicolon as a separator, and what are potential problems or some interesting notes? (in terms of server and client)

+51
url query-string parsing webserver
Aug 14 '10 at 1:44
source share
4 answers

1999 W3C Recommendation is outdated. The current state, according to the 2014 W3C Recommendation , is that the semicolon is now illegal as a parameter separator:

To decode the application payload / x -www-form-urlencoded, use the following algorithm. [...] The result of this algorithm is a sorted list of name-value pairs. [...]

  • Let strings be the result of a strict separation of the string payload into characters U + 0026 AMPERSAND (&).

In other words ?foo=bar;baz means that the parameter foo will have the value bar;baz ; while ?foo=bar;baz=sna should lead to foo bar;baz=sna (although technically illegal, since the second = should be escaped to %3D ).

+19
Nov 23 '16 at 15:40
source share

As long as your HTTP server and your server application accept semicolons as delimiters, you should be good to go. I do not see any flaws. As you said, the W3C spec is on your side :

We recommend that HTTP server developers, and in particular CGI developers, support the use of ";" instead of "&" to save authors from fleeing "&" characters in this way.

+17
Aug 14 '10 at 1:59
source share

I agree with Bob Aman. The W3C specification is intended to simplify the use of hyperlinks with URLs that look like GET request forms (for example, http://www.host.com/?x=1&y=2 ). In this context, an ampersand conflicts with a system for references to symbolic entities that begin with an ampersand (for example, " ). Therefore, the W3C recommends that web servers allow the use of a semicolon as a field separator instead of an ampersand, in order to simplify the writing of these URLs. But this decision requires that the authors remember that the ampersand must be replaced by something and that ; is an equitable field separator, although web browsers universally use ampersands in the URL when submitting forms. This is probably harder if you remember how to replace the ampersand with & in these links, as would be done elsewhere in the document.

Worse, until all web servers use semicolons as field separators, URL authors can only use this shortcut for some hosts and must use & For others. They will also need to change their code later if this host no longer allows semicolon delimiters. This is certainly more complicated than just using & which will work on every server forever. This, in turn, removes any incentive for web servers to use semicolons as field separators. Why worry when everyone is already changing the ampersand to & instead ; ?

+5
Jul 11 '14 at 9:53 on
source share

In short, HTML is a big mess (due to its leniency), and using semicolons helps to simplify this MUCH. I believe that when I take into account the difficulties I have discovered, using ampersands as a separator makes the whole process about three times harder than using semicolons instead of separators!

I am a .NET programmer, and as far as I know, .NET by its nature does not allow ';' separators, so I wrote my own methods for parsing and processing, because I saw great importance in using semicolons, and not in the already problematic system of using ampersands as separators. Unfortunately, very respected people (like @Bob Aman in another answer) see no value in why using a semicolon is much better and much easier than using ampersands. So, now I will share a few points to convince other respected developers who are not yet aware of the value of using semicolons:

Using a query string like "? A = 1 & b = 2" on an HTML page is impractical (without HTML pre-coding), but in most cases this works. This, however, is only due to the fact that most browsers are tolerant, and this tolerance can lead to difficult to detect errors when, for example, the value of a key-value pair is published in the URL of an HTML page without proper encoding (directly like '? a = 1 & b = 2' in the HTML source). QueryString type string "? Who = me + & + you "is also problematic.

We humans may have biases and may disagree with our biases throughout the day, so acknowledging our biases is very important. For example, I agree that I am just thinking of separating from ';' looks cleaner. I agree that my β€œpure” opinion is purely biased. And another developer may have the same opposite and equally valid bias. So my bias on this one point is no more correct than the opposite bias.

But, given the impartial support for the semicolon that makes every person's life in the long run, it cannot be properly challenged if the whole picture is taken into account. In short, using semicolons makes life easier for everyone, with one exception: a small obstacle to getting used to something new. It's all. It's always harder to change something. But the difficulty of making changes fades compared to the ongoing difficulty of continuing to use & amp ;.

Via; as a QueryString delimiter makes it a lot easier. Separators with ampersands are more than twice as difficult to code correctly than using semicolons. (I think) most implementations are not coded properly, so most implementations are not twice as complex. But then tracking and fixing errors leads to reduced performance. Here, I point out 2 separate coding steps necessary to correctly encode a QueryString when & is a delimiter:

  • Step 1. The URL encodes both the keys and the query string values.
  • Step 2. Combine the keys and values, such as 'a = 1 & b = 2', after they are encoded from the URL from step 1.
  • Step 3. HTML then encodes the entire QueryString string in the HTML source of the page.

Therefore, for the correct (error-free) URL encoding, you need to do special encoding twice, and not only, but these are two different, different types of encoding. The first is the URL encoding, and the second is the HTML encoding (for HTML source code). If any of this is incorrect, then I may find you an error. But step 3 is different for XML. XML instead requires an XML character encoding (which is almost identical). I want to say that the latest encoding depends on the context of the URL, whether on the HTML web page or in the XML documentation.

Now with much simpler semicolon separators, the process looks like this:

  • 1: URL encodes keys and values,
  • 2: combine values ​​together. (No encoding for step 3.)

I think most web developers skip step 3 because browsers are very lenient. But this leads to errors and additional difficulties when they seek out these errors or users cannot do anything if there were no such errors, or write error reports, etc.

Another difficulty in real use is writing XML documentation markup in my source code in both C # and VB.NET. C & Must be encoded, this is a real brake, literally, on my performance. This additional step 3 also makes it difficult to read the source code. Thus, this hard-to-read deficit applies not only to HTML and XML, but also to other applications, such as C # and VB.NET code, because their documentation uses XML documentation. Thus, the coding complexity of step No. 3 extends to other applications.

So, in the end, using; the separator is simple because the (correct) process when using a semicolon is how a single process normally expects: only one coding step needs to be performed.

Perhaps this was not too confusing. But all the confusion or difficulty is associated with the use of the separation character, which must be encoded in HTML format. Thus, & culprit And semicolon removes all this complication.

(I will point out that my three-step and two-step process described above usually consists of how many steps will be required for most applications. However, for fully reliable code, all 3 steps are required, regardless of which separator is used. But in my experience, most implementations Therefore, using a semicolon as a separator of query strings will make life easier for more people with fewer website errors and interactions if everyone accepts a semicolon as a character cheniya by default instead of an ampersand.)

+2
Jan 25 '16 at 18:55
source share



All Articles