Predict the mood of the future tweet on Twitter

I'm trying to predict the mood of the next tweet posted by a Twitter user. Now I have the following steps (steps 1 and 2 are already implemented in python):

  • Learn how to classify a tweet as postive (1), neutral (0), or negative (-1). For this, I use the naive classifier of the bays, and it works very well.

  • Classify existing tweets from the user. This leads to a series of numbers like this: [0, 1, -1, -1, -1, 0, 1, 1, ..] There is also information about the time of publication.

Is it possible to predict the mood (1, 0 or -1) for the next tweet?

What algorithm can I use for this?

I still don’t know how this works, but are Markov models hidden, suitable or some kind of regression?

+4
source share
2 answers

I think one attractive way to think about it is in terms of prior and likelihood moods. Naive Bayes is a model of credibility (as far as I can see this exact tweet, given that it is positive?). You ask about the previous likelihood that the next tweet will be positive, given that you have observed a certain sequence of moods so far. There are several ways to do this:

  • The most naive way - the percentage of tweets that the user said is positive - is the likelihood that the next will be positive.
  • However, this ignores regency. You can come up with a model based on the transition: from each possible previous state, the probability of the next tweet will be positive, negative or neutral. Thus, you have a 3x3 transition matrix, and the conditional probability of the next tweet, positive with the last, was positive, this is the probability of the transition pos-> pos. This can be estimated from the accounts and is a Markov process (basically this is the main condition).
  • With these transitional models, you can become more and more complex, for example, the current “state” may be the mood of the last two or even the last n-tweets, which means that you get more specific forecasts due to more and more parameters in the model. You can overcome this with anti-aliasing schemes, parameter bindings, etc. Etc.

As a last point, I think that @ Anony-Mousse points out that the previous ones were weak evidence would be true: indeed, no matter what you said earlier, I think this will be the dominant likelihood function (which in fact in the tweet in question). If you get to the tweet, consider CRF as @Neil McGuigan suggests.

+3
source

On the computer training side, you can consider sequential associations:

http://web.mit.edu/rudin/www/RudinEtAlCOLT11.pdf

There are several java libraries on this site:

http://www.philippe-fournier-viger.com/spmf/

The hidden Markov model should also work. HMM is a special case of conditional random field that allows you to look at other factors such as weather or news scans.

I wonder if the following person’s tweets are affected by current tweets: a) all b) or those that follow

0
source

Source: https://habr.com/ru/post/1479055/


All Articles