Convert unicode to html hex entity

How to convert Unicode string to HTML objects? ( HEX not decimal)

For example, convert Français to Français .

+4
source share
4 answers

Your string looks like a UCS-4 encoding that you can try

 $first = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) { $char = current($m); $utf = iconv('UTF-8', 'UCS-4', $char); return sprintf("&#x%s;", ltrim(strtoupper(bin2hex($utf)), "0")); }, $string); 

Output

 string 'Français' (length=13) 
+6
source

For missing hexadecimal encoding in a related question :

 $output = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) { list($utf8) = $match; $binary = mb_convert_encoding($utf8, 'UTF-32BE', 'UTF-8'); $entity = vsprintf('&#x%X;', unpack('N', $binary)); return $entity; }, $input); 

This is similar to @Baba's answer using UTF-32BE and then unpack and vsprintf for formatting needs.

If you prefer iconv over mb_convert_encoding , this looks like:

 $output = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) { list($utf8) = $match; $binary = iconv('UTF-8', 'UTF-32BE', $utf8); $entity = vsprintf('&#x%X;', unpack('N', $binary)); return $entity; }, $input); 

I find this line manipulation a little clearer, and then in Get the hex code for the html objects .

+8
source

Firstly, when I came across this problem recently, I solved it by making sure that my code files, database connection and database tables were all UTF-8. Then just the echo of the text works. If you should avoid database output, use htmlspecialchars() rather than htmlentities() so that UTF-8 characters are left alone and not trying to escape.

I would like to document an alternative solution, because it solved a similar problem for me. I used PHP utf8_encode() to remove special characters.

I wanted to convert them to HTML objects for display, I wrote this code because I wanted to avoid iconv or such functions as much as possible, since not all environments necessarily have them (correct me if it is not!)

 $foo = 'This is my test string \u03b50'; echo unicode2html($foo); function unicode2html($string) { return preg_replace('/\\\\u([0-9a-z]{4})/', '&#x$1;', $string); } 

Hope this helps someone need this :-)

+4
source

See How to get character from Unicode code point in PHP? for some code that allows you to do the following:

Usage example :

 echo "Get string from numeric DEC value\n"; var_dump(mb_chr(50319, 'UCS-4BE')); var_dump(mb_chr(271)); echo "\nGet string from numeric HEX value\n"; var_dump(mb_chr(0xC48F, 'UCS-4BE')); var_dump(mb_chr(0x010F)); echo "\nGet numeric value of character as DEC string\n"; var_dump(mb_ord('ď', 'UCS-4BE')); var_dump(mb_ord('ď')); echo "\nGet numeric value of character as HEX string\n"; var_dump(dechex(mb_ord('ď', 'UCS-4BE'))); var_dump(dechex(mb_ord('ď'))); echo "\nEncode / decode to DEC based HTML entities\n"; var_dump(mb_htmlentities('tchüß', false)); var_dump(mb_html_entity_decode('tchüß')); echo "\nEncode / decode to HEX based HTML entities\n"; var_dump(mb_htmlentities('tchüß')); var_dump(mb_html_entity_decode('tchüß')); echo "\nUse JSON encoding / decoding\n"; var_dump(codepoint_encode("tchüß")); var_dump(codepoint_decode('tch\u00fc\u00df')); 

Output :

 Get string from numeric DEC value string(4) "ď" string(2) "ď" Get string from numeric HEX value string(4) "ď" string(2) "ď" Get numeric value of character as DEC int int(50319) int(271) Get numeric value of character as HEX string string(4) "c48f" string(3) "10f" Encode / decode to DEC based HTML entities string(15) "tchüß" string(7) "tchüß" Encode / decode to HEX based HTML entities string(15) "tchüß" string(7) "tchüß" Use JSON encoding / decoding string(15) "tch\u00fc\u00df" string(7) "tchüß" 
0
source

Source: https://habr.com/ru/post/1444688/


All Articles