Emoji Value Range

I tried to remove all emoji characters from a string (like a disinfectant). But I can not find the full set of emoji values.

What is the complete set of UTF16 character emoji character values?

+6
source share
2 answers

Unicode Standard Unicode® Technical Report No. 51 includes the list of emoji ( emoji-data.txt ):

... 21A9 ; text ; L1 ; none ; j # V1.1 (↩) LEFTWARDS ARROW WITH HOOK 21AA ; text ; L1 ; none ; j # V1.1 (↪) RIGHTWARDS ARROW WITH HOOK 231A ; emoji ; L1 ; none ; j # V1.1 (⌚) WATCH 231B ; emoji ; L1 ; none ; j # V1.1 (⌛) HOURGLASS ... 

I believe that you will want to delete every character specified in this document that had Default_Emoji_Style of emoji .

There is no other way than linking to a definition list like this to identify emoji characters in Unicode. As mentioned in the FAQ, they are distributed in different blocks.

+4
source

If you are dealing only with an English character and an emoji character, I think this is doable. First convert your string to UTF-16 characters, then check every character whose value is greater than 0x0xD800 (for emoji is actually> = 0xD836) should be emoji.

This is due to the fact that "the Unicode standard constantly reserves code point values ​​between 0xD800 and 0xDFFF for UTF-16 encoding for high and low surrogates " and, of course, English characters (and many other characters will not fall into this range)

But since the emoji code point starts with U1F300 , their UFT-16 value does fall within this range.

Check here for a brief reference to the meaning of emoji UFT-16 , if you do not bother to do it yourself.

-1
source

Source: https://habr.com/ru/post/987964/


All Articles