First, note that $ is not a known object in HTML 4.01 . This, however, is in HTML 5, and in PHP 5.4 you can call html_entity_decode using ENT_QUOTES | ENT_HTML5 ENT_QUOTES | ENT_HTML5 to decode it.
You must decode the object and only then convert it:
//assumes $str is in UTF-8 (or ASCII) function foo($str) { $dec = html_entity_decode($str, ENT_QUOTES, "UTF-8"); //convert to UTF-16BE $enc = mb_convert_encoding($dec, "UTF-16BE", "UTF-8"); $out = ""; foreach (str_split($enc, 2) as $f) { $out .= "\\u" . sprintf("%04X", ord($f[0]) << 8 | ord($f[1])); } return $out; }
If you want to replace only entities, you can use preg_replace_callback to match objects, and then use foo as a callback.
function repl_only_ent($str) { return preg_replace_callback('/&[^;]+;/', function($m) { return foo($m[0]); }, $str); } echo repl_only_ent("€foobar ´");
gives:
\ u20ACfoobar \ u00B4
source share