I am converting the encoding to UTF-8 to compare the UoF-8 value of the emoji value with the entire emoji value in the remoji library which is in UTF-8. I use stringr library to find emoji position in vector. One can use grep or any other function.
1st method:
library(stringr) xvect = c('๐', 'no', '๐น', '๐', 'no', '๐') Encoding(xvect) <- "UTF-8" which(str_detect(xvect,"[^[:ascii:]]")==T)
Here 1,3,4 and 6 are the emoji symbol in this case.
Edited by:
Second method: Install a package called remoji using devtools using the command below, since we have already converted emoji elements to UTF -8. we can now compare the UTF- 8 values โโof all emoji present in the emoji library. Use trimws to remove spaces
install.packages("devtools") devtools::install_github("richfitz/remoji") library(remoji) emj <- emoji(list_emoji(), TRUE) xvect %in% trimws(emj)
Output:
which(xvect %in% trimws(emo))
Both of these methods are not complete proof, and the first method assumes that there are no ascii characters in the vector except emojis, and the second method is based on remoji library information. In the case when some information about emoji is not present in the library, the last command can give FALSE instead of TRUE .
Final Edit:
According to the discussion between OP ( @MichaelChirico ) and @SymbolixAU . Thanks to both of them, it seems like a problem with a small typo of capital U. The new regular expression is xvect[grepl('[\U{1F300}-\U{1F6FF}]', xvect)] . The range in the character class is taken from F300 to F6FF. Of course, you can change this range to a new range in cases where emoji is outside this range. This may not be a complete list, and over time, these ranges may continue to increase / change.
source share