URL Query String Encoding Detection

In the query url, I can get the query string ?dir=Documents%20partag%C3%A9s or ?dir=Documents%20partag%E9s . I think the first one is UTF-8 and the second is ASCII.

Real line: Documents partagés

So, I have a PHP script (in UTF-8), and what I want to do is determine if the query string is ASCII or UTF-8, and if ASCII, convert it to UTF-8.

I tried using the mb_ functions, but the query string is always defined as the version of the ASCII query string and urldecode as UTF-8.

How can i achieve this? Please note that Wikipedia has a similar function - it encodes itself %E9 to %C3%A9 .

+4
source share
1 answer

E9 is 233 in decimal. This is not a valid ASCII byte (only 0-127), but it is é in ISO-8859-1 (Latin1). When using mb_convert_encoding you can specify several encodings (for example: UTF-8 and ISO-8859-1).

This should fix:

 mb_convert_encoding($str, 'UTF-8', 'UTF-8,ISO-8859-1'); 

With the following script:

 $str1 = 'Documents%20partag%E9s'; $str2 = 'Documents%20partag%C3%A9s'; var_dump(mb_convert_encoding(urldecode($str1), 'UTF-8', 'UTF-8,ISO-8859-1')); var_dump(mb_convert_encoding(urldecode($str2), 'UTF-8', 'UTF-8,ISO-8859-1')); 

I get:

 string(19) "Documents partagés" string(19) "Documents partagés" 
+6
source

Source: https://habr.com/ru/post/1337370/


All Articles