Convert UTF8 text to use in url

I am developing an international site that uses UTF8 to display non-English characters. I also use friendly URLs that contain the name of the element. Obviously, I cannot use non-English characters in the URL.

Is there any common practice for this conversion? I am not sure which English characters I should replace. Some of them are completely obvious (for example, from e to e), but other characters that I am not familiar with (for example, ß).

+4
source share
5 answers

I usually use iconv () with the 'ASCII // TRANSLIT' option. This introduces like:

último año 

and produces output, for example:

 'ultimo a~no 

Then I use preg_replace () to replace dashes with spaces:

 'ultimo-a~no 

... and delete unnecessary characters, for example.

 [^a-z0-9-] 

This is probably useless with Arabic or Chinese, but works great with Spanish, French or German.

+5
source

You can use UTF-8 encoded data in URLs. You just need to encode it using Percent Encoding (see rawurlencode ):

 // ß (U+00DF) = 0xC39F (UTF-8) $str = "\xC3\x9F"; echo '<a href="http://en.wikipedia.org/wiki/'.rawurlencode($str).'">'.$str.'</a>'; 

This will be a link to http://en.wikipedia.org/wiki/ß . Modern browsers will display the ß character itself in the location bar instead of the percent encoded representation of that character in UTF-8 ( %C3%9F ).

If you do not want to use UTF-8, but only ASCII characters, I suggest using transliteration, for example, as proposed by Álvaro G. Vicario.

+6
source

Obviously, I cannot use non-English characters in the URL.

Actually you can. Wikipedia software (built into PHP) supports this, for example. ru.wikipedia.org/wiki/☃ .

Please note that you need to correctly encode the url as shown in other answers.

+3
source

Use rawurlencode to encode the name for the url and rawurldecode to convert the name to url back to the original string. These two functions convert strings to and from URLs in accordance with RFC 1738 .

+2
source

The last time I tried (about a week ago), UTF-8 characters (especially Japanese) worked fine in URLs without any extra encoding. I even looked directly at the address bars in all the browsers I tested (Safari, Chrome and Firefox, all on Mac), and I have no idea which browser my girlfriend used on the windows. In addition to most of the windows that I looked at, I just showed squares for Japanese characters because they lacked the required fonts to display them. It seems to work well there too.

URL I tried: http://www.webghoul.de.private-void.net/cache/black-f-with- あ い -50.png (WMD doesn't seem to like it)

Proof of screenshot http://heavymetal.theredhead.nl/~kris/stackoverflow/screenshot-utf8-url.png

Thus, this cannot be allowed by the specification, since I saw that it works well in all directions, except, perhaps, in editors, which, like the specification, are many; -)

In fact, I would not recommend using these character types in URLs, but I would also not be the primary task of "fixing".

-1
source

Source: https://habr.com/ru/post/1303702/


All Articles