Your functions count different things.
Graphemes: π πΏ β πΏοΈ @ mention 13 ----------- ----------- -------- --------------------- ------ ------ ------ ------ ------ ------ ------ ------ ------ Code points: U+1F44D U+1F3FF U+270C U+1F3FF U+FE0F U+0020 U+0040 U+006D U+0065 U+006E U+0074 U+0069 U+006F U+006E 14 UTF-16 code units: D83D DC4D D83C DFFF 270C D83C DFFF FE0F 0020 0040 006D 0065 006E 0074 0069 006F 006E 17 UTF-16-encoded bytes: 3D D8 4D DC 3C D8 FF DF 0C 27 3C D8 FF DF 0F FE 20 00 40 00 6D 00 65 00 6E 00 74 00 69 00 6F 00 6E 00 34 UTF-8-encoded bytes: F0 9F 91 8D F0 9F 8F BF E2 9C 8C F0 9F 8F BF EF B8 8F 20 40 6D 65 6E 74 69 6F 6E 27
PHP strings are originally bytes.
strlen() counts the number of bytes in a string: 27.
mb_strlen(..., 'utf-8') counts the number of code points (Unicode characters) in a string when its bytes are decoded into characters using UTF-8: 14 encoding.
(Other example examples are pretty much pointless, because they are based on processing the input string as one encoding, when in fact it contains data in a different encoding.)
NSStrings are counted as UTF-16 code units. There are 17, not 14, because the specified string contains characters of type π that do not fit into a single code block UTF-16, so they must be encoded as a surrogate pair. There are no functions that will count lines in UTF-16 code modules in PHP, but since each block of code is encoded up to two bytes, you can easily execute it by encoding UTF-16 and dividing the number of bytes by two:
strlen(iconv('utf-8', 'utf-16le', $str)) / 2
(Note: the le suffix is ββneeded to make iconv encoding specific UTF-16 content, rather than resetting the score by selecting it and adding a specification to the beginning of the line to say that the one that he selected.)
source share