Printing Unicode PHP characters

I have a database that stores the names of video games with Unicode characters, but I cannot figure out how to correctly escape these Unicode characters when printing them in HTML response.

For example, when I print all games named Uncharted, I get the following:

Uncharted: Drake Fortuneâ„¢ Uncharted 2: Among Thievesâ„¢ Uncharted 3: Drake Deceptionâ„¢ 

but it should display this:

 Uncharted: Drake Fortune™ Uncharted 2: Among Thieves™ Uncharted 3: Drake Deception™ 

I ran a quick javascript escape function to see which Unicode character has, and found that it \u2122 .

I have no problem fully escaping every character in the string if I can correctly display the character. My guess is to somehow find the hexadecimal representation of each character in a string and have PHP displaying Unicode characters as follows:

 print "&#x2122"; 

I ask you to advise the best approach for Unicode, avoiding the line for friendly HTML. I did something similar for JavaScript, but JavaScript has a built-in function for escape and unescape.

I do not know any PHP functions of a similar function. I read about ord , but it just returns the ASCII character code for the given character, so the ™ or ™ . I would like this feature to be versatile enough to apply to any string containing valid Unicode characters.

+6
source share
4 answers

It looks like you internally encoded UTF-8 strings, PHP displays them correctly, but your browser cannot automatically detect the encoding (it solves ISO 8859-1 or some other encoding).

The best way is to tell the browser that UTF-8 is being used by sending the appropriate HTTP header:

 header("content-type: text/html; charset=UTF-8"); 

Then you can leave the rest of your code as is and not have any restrictions on the html encoding of entities or create another mess.

If you want, you can optionally declare the encoding in the generated HTML using the <meta> :

  • <meta http-equiv=Content-Type content="text/html; charset=UTF-8"> for HTML <= 4.01
  • <meta charset="UTF-8"> for HTML5

The HTTP header takes precedence over the <meta> , but the latter can be useful if HTML is saved in HD format and then read locally.

+14
source

I spent a lot of time trying to find a better way to just print the equivalent Unicode char code, and the methods I found did not work or it was very difficult.

This suggests that JSON can represent Unicode characters using the syntax "\ u [unicode_code]" and then:

 echo json_decode('"\u00e1"'); 

An equivalent unicode char will be printed, in this case:

PD Pay attention to simple and double quotes. If you do not put both, this will not work.

+9
source

Try the following:

 echo htmlentities("Uncharted: Drakes Fortune™ \n", ENT_QUOTES, "UTF-8"); 

From: http://php.net/htmlentities

+6
source
 // PHP 7.0 var_dump( IntlChar::chr(0x2122), IntlChar::chr(0x1F638) ); var_dump( utf8_chr(0x2122), utf8_chr(0x1F638) ); function utf8_chr($cp) { if (!is_int($cp)) { exit("$cp is not integer\n"); } // UTF-8 prohibits characters between U+D800 and U+DFFF // https://tools.ietf.org/html/rfc3629#section-3 // // Q: Are there any 16-bit values that are invalid? // http://unicode.org/faq/utf_bom.html#utf16-7 if ($cp < 0 || (0xD7FF < $cp && $cp < 0xE000) || 0x10FFFF < $cp) { exit("$cp is out of range\n"); } if ($cp < 0x10000) { return json_decode('"\u'.bin2hex(pack('n', $cp)).'"'); } // Q: Isn't there a simpler way to do this? // http://unicode.org/faq/utf_bom.html#utf16-4 $lead = 0xD800 - (0x10000 >> 10) + ($cp >> 10); $trail = 0xDC00 + ($cp & 0x3FF); return json_decode('"\u'.bin2hex(pack('n', $lead)).'\u'.bin2hex(pack('n', $trail)).'"'); } 
+1
source

Source: https://habr.com/ru/post/948991/


All Articles