How do you think Google deals with this encoding problem?

I recently ran into a coding problem related to the way Firefox encodes URLs directly entered into the address bar. Basically, it looks like the default character encoding for the default URLs is not UTF-8, which is typical for most browsers. They also seem to be trying to make some reasonable decisions about which character encoding to use based on the contents of the URL.

For example, if you enter the URL directly into the address bar (I use Firefox 3.5.5) with the "q" parameter, you will get the following results:

For this query string parameter, this is so encoded in the HTTP request:
1) ... q = Književni → q = Knji% 9Eevni (this is apparently encoded by iso-8859-1)
2) ... q = 漢字 → q =% E6% BC% A2% E5% AD% 97 (this is apparently encoded by UTF-8)
3) ... q = Književni 漢字 → Knji% C5% BEevni% E6% BC% A2% E5% AD% 97 (This is apparently encoded by UTF-8 ... which is odd because note that the first part of the value is the same as 1, which was encoded by iso-8859-1).

So this really should not be a big problem, right? Well, for me, not really, but it seems. In the application I'm working on, we have a search box in our global navigation. When a user enters a search query in our search box, the "q" parameter (as in our example, the parameter that contains the value of the query string) is presented in the request and encoded in UTF-8 encoding, and everything is fine and good.

However, the URL that then appears in the address bar contains the decoded form of that URL, so the q parameter looks like "q = Književni". Now, as I mentioned earlier, if the user then presses the ENTER key to send what is in the address bar, the parameter "q = Književni" is now encoded in iso-8859-1 and sent to our server as "q = Knji% 9Eevni " The problem is that we always expect a UTF-8 encoded URL ... so when we get this parameter, our application does not know how to interpret it, and this can cause some strange results.

, , , Firefox, , , . , Google . URL- Google:

http://www.google.com/search?q=Knji%C5%BEevni
http://www.google.com/search?q=Knji%9Eevni

, , , -, ? , - Firefox?

+3
2

, latin-1, - , UTF-8.

, - , , , UTF-8, UTF-8. UTF-8, , -1 (iso-8859-1).

- , UTF-8 , , -, UTF-8, , UTF-8.

, , , Firefox - , , , - , , UTF-8, .

+2

URL- . IDN ( ) (http://en.wikipedia.org/wiki/Internationalized_domain_name).

, , () . (% escaping). html , .

, Firefox, / . .

0

Source: https://habr.com/ru/post/1723373/


All Articles