Turn off tweets

Suppose I have a website where users create threads and write threads on Fruit.

To inform users of all Fruit conversations throughout the web page, I collect tweets related to a particular topic and create themes based on the contents of the tweet.

It’s very important that tweets are relevant to the topic, obviously. Let's say that the user creates the theme "Apples and Oranges". I pull out all the tweets containing the keywords Apples an or oranges.

The problem I am facing is that some Twitter users use a tweet that includes the keywords "Apples", "Oranges", "Pears," for example, and they collect and publish as a thread in the discussion topic "Apples and oranges. " It makes users angry!

So, I need a way to filter out any tweet that includes fruit words other than apples and / or oranges.

For example, if a twitter user writes “I love apples, oranges, pears and grapes,” then this tweet should not be included.

Now you can make Twitter search so complicated. Thus, the exception logic must be executed in Ruby after collecting the tweets.

Programmatically, how would you decide to solve this?

+4
source share
5 answers

Identify the words associated with the topic name. Pears, grapes, etc. You can then exclude tweets that use these related words.

One way to do this is to use Google Sets.

NOTE. I am in an unsuccessful position, not completely justifying my own decision due to the fact that this service does not have an official API (how surprising it would be!). Although, if you are going to use this strategy, I would suggest saving the result of Google Set.

require 'google_set' twitter_search_terms = ['apples', 'oranges'] # Mocked twitter search method tweets = search_twitter(twitter_search_terms) # returns ["Both apples and oranges are great!", "I love Apples, Oranges, Pears, and Grapes."] related_words = GoogleSet.for(*twitter_search_terms) # returns ["apples", "oranges", "bananas", "peaches", "pears", "grapes", "strawberries", "plums", ...] related_words = (related_words - twitter_search_terms).each(&:downcase) good_tweets = [] bad_tweets = [] tweets.each do |tweet| tweet_words = tweet.downcase.split # Remove any non-word characters tweet_words = tweet_words.map { |word| word.gsub(/\W+/, '') }.compact if (tweet_words - related_words).size == tweet_words.size good_tweets << tweet else bad_tweets << tweet end end p good_tweets # returns ["Both apples and oranges are great!"] p bad_tweets # returns ["I love Apples, Oranges, Pears, and Grapes."] 
+7
source
 class Fruit < AR::Base has_many :tweets end class Tweet < AR::Base belongs_to :fruit # validation catches any tweets that mention more than one fruit def validate self.errors[:base] = 'Mentions too many fruit' unless single_topic? end def single_topic? Fruit.count(:conditions => {:name => words).eql?(1) end # if validation passes the the fruit is parsed before_create :parse_fruit_from_text def parse_fruit_from_text self.fruit_id = Fruit.first(:conditions => {:name => words}, :select => 'id').id end def words @words ||= this.text.split(' ') end end # Now you can just do... Tweet.create(json) 

You will need to consider the differences in the case of Fruit # names. I would suggest just keeping all the names lowercase and then deleting any queries. You can also use your own SQL queries using LOWER.

+1
source

Yes, you have to do it in Ruby. Immediately after posting a tweet, make sure that it does not contain any keywords other than the search keyword. So, if you find a tweet looking for "Apple", you should make sure that it does not contain other (N-1) keywords such as Orange, Grapes, etc.

Alternatively, you can split the tweet into words, and then make sure that none of the words matches your keywords other than search. This will be faster since a tweet can only contain smaller words than the number of keywords.

0
source

Take a look at the Ruby Classifier Stone.

0
source

As an additional suggestion, given that your site is most likely not related to the benefits, you can decide which keywords to exclude using other groups created by users on your site.

For example, if someone creates the “Apples” group, and someone else creates the “Oranges” group, then the tweet about “Apples and Oranges” will correctly display in none, but the tweet about oranges and kumquats will correctly display in the thread of oranges until someone else made a band for Kumquats.

0
source

Source: https://habr.com/ru/post/1341097/


All Articles