How to search emoticon / emoji in elastics search?

I am trying to look for a smiley / emoji containing text in elasticsearch. I used to embed tweets in ES. Now I want to find an example of smiles or sad faces associated with tweets. I tried the following

1) used the equivalent of unicode smile values, but did not work. Results were not returned.

GET /myindex/twitter_stream/_search { "query": { "match": { "text": "\u1f603" } } } 

How to set up emoji search in elasticsearch? Do I need to encode raw tweets before swallowing in elasticsearch? What will be the request? Any experienced approaches? Thank you

+5
source share
2 answers

The specification explains how to look for emoji:

The search includes finding emoji characters in queries and finding emoji characters in the target. They are most useful when they include annotations as synonyms or tips. For example, when someone searches for โ›ฝ๏ธŽ on yelp.com, they see matches for โ€œStation gas.โ€ Conversely, searching for โ€œgas pumpโ€ in a search engine may find pages containing โ›ฝ๏ธŽ.

Annotations are language dependent: search on yelp.de, someone expects search to lead to matches for "Tankstelle".

You can save the real Unicode char and deploy it to annotations in every language you want to support.

This can be done using the synonym filter. But the standardized Elasticsearch tokenizer will remove emoji, so there is a lot of work to do:

  • remove emoji modifier, clear everything
  • tokenize with spaces;
  • remove unwanted punctuation marks;
  • expand emoji to their synonyms.

The whole process is described here: http://jolicode.com/blog/search-for-emoji-with-elasticsearch (disclaimer: I am the author).

+6
source

As I saw working with emoticons, in fact, the string is stored instead of copies of images when you store them in the database. E.g. A smile is saved as: smile :. You can check it in your case. If so, you can add a custom tokenizer that does not mark colons so that you can accurately match emoticons. Then, when searching, you just need to convert the emoticon image in the searches to the appropriate line, and elasticsearch will be able to find it. Hope this helps.

+1
source

Source: https://habr.com/ru/post/1239923/


All Articles