Unicode equivalent charCodeAt in PHP

I have simple code in JS that I cannot replicate to PHP when it comes to special characters.

This is the JS code (see JSFiddle for output):

var str = "t๐Ÿ™๐Ÿฟ๐Ÿ˜˜๐ŸŽšโ†™๏ธ๐Ÿ•—๐Ÿ‡จ๐Ÿ‡ฌ๐ฏฆ”"; //char "t" and special characters, emojis, etc..
document.write("Length is: "+str.length);
for(var i=0; i<str.length; i++) {
  document.write("<br> charCodeAt(" + i + "): " + str.charCodeAt(i));
}

PHP strlen()already gives a different result, but I managed to get the same thing with a custom function JS_StringLength(thanks to this SO answer).

Here is what I have in PHP so far (see Sandbox for output):

<?php

function JS_StringLength($string) {
    return strlen(iconv('UTF-8', 'UTF-16LE', $string)) / 2;
}

function JS_charCodeAt($str, $index){
    //not working!

    $char = mb_substr($str, $index, 1, 'UTF-8');
    if (mb_check_encoding($char, 'UTF-8'))
    {
        $ret = mb_convert_encoding($char, 'UTF-32BE', 'UTF-8');
        return hexdec(bin2hex($ret));
    } else {
        return null;
    }
}

$str = "t๐Ÿ™๐Ÿฟ๐Ÿ˜˜๐ŸŽšโ†™๏ธ๐Ÿ•—๐Ÿ‡จ๐Ÿ‡ฌ๐ฏฆ”";

echo $str."\n";
//echo "Length is: ".strlen($str)."\n"; //wrong
echo "Length is: ".JS_StringLength($str)."\n"; //OK
for($i=0; $i<JS_StringLength($str); $i++) {
    echo "charCodeAt(".$i."): ".JS_charCodeAt($str, $i)."\n";
}

However, after a whole day, Google and everything I found, did not give the same results as JS.

Please help me, I have no more ideas!
What should JS_charCodeAtget the same result as JS?

EDIT: , javascript PHP? charCodeAt, , , emojis.

Update1: , : https://r12a.imtqy.com/apps/conversion/ ( ). , JS UTF-16, PHP UTF-8. php UTF-16, hex- > dec ? , ?

UPDATE2: , .., , , json_encode() , (facepalm), - , JavaScript "". !

+4
3

, JS UTF-16, ; charCodeAt , . , String.codePointAt() . , , , , json :

<?php

$original = 't๐Ÿ™๐Ÿฟ๐Ÿ˜˜๐ŸŽšโ†™๏ธ๐Ÿ•—๐Ÿ‡จ๐Ÿ‡ฌ่Šณ';
$converted = iconv('UTF-8', 'UTF-16LE', $original);

for ($i = 0; $i < iconv_strlen($converted, 'UTF-16LE'); $i++) {
    $character = iconv_substr($converted, $i, 1, 'UTF-16LE');
    $codeUnits = unpack('v*', $character);

    foreach ($codeUnits as $codeUnit) {
        echo $codeUnit . PHP_EOL;
    }
}

() UTF-8 UTF-16, . UTF-16 2 4 . v , 2 (v - ).

UTF-8 ; . mb_ *.

+1

, , , . , json_encode() , JS (, ๐Ÿ˜˜ = "\ud83d\ude18") , charCodeAt .. , JSON , . , UTF-16 ( JS). , , charCodeAt (ord() \uXXXX hex dec ).

: "JS charCodeAt" for, , preg_match_all in getUTF16CodeUnits . getUTF16CodeUnits , . : (backup)

:

<?php

function getUTF16CodeUnits($string) {
    $string = substr(json_encode($string), 1, -1);
    preg_match_all("/\\\\u[0-9a-fA-F]{4}|./mi", $string, $matches);
    return $matches[0];
}

function JS_StringLength($string) {
    return count(getUTF16CodeUnits($string));
}

function JS_charCodeAt($string, $index) {
    $utf16CodeUnits = getUTF16CodeUnits($string);
    $unit = $utf16CodeUnits[$index];

    if(strlen($unit) > 1) {
        $hex = substr($unit, 2);
        return hexdec($hex);
    }
    else {
        return ord($unit);
    }
}

$str = "t๐Ÿ™๐Ÿฟ๐Ÿ˜˜๐ŸŽšโ†™๏ธ๐Ÿ•—๐Ÿ‡จ๐Ÿ‡ฌ๐ฏฆ”";

echo "Length is: ".JS_StringLength($str)."\n";
for($i=0; $i<JS_StringLength($str); $i++) {
    echo "charCodeAt(".$i."): ".JS_charCodeAt($str, $i)."\n";
}

, , !

0

JavaScript charCodeAt, :

function JS_charCodeAt($str, $index) {
    $utf16 = mb_convert_encoding($str, 'UTF-16LE', 'UTF-8');
    return ord($utf16[$index*2]) + (ord($utf16[$index*2+1]) << 8);
}

But it charCodeAtis problematic and should be replaced by codePointAt. Most JavaScript code regarding characters in additional Unicode plans, such as Emojis and using charCodeAt, is probably erroneous. You can find the code emulating codePointAtin the answers to the question UTF-8 Safe Equivelant of ord or charCodeAt () in PHP .

0
source

Source: https://habr.com/ru/post/1662116/


All Articles