Uncontrolled mood analysis

I read a lot of articles explaining the need for an initial set of texts that are classified as “positive” or “negative” before the sentiment analysis system really works.

My question is: did anyone try to do a rudimentary check of "positive" adjectives against "negative" adjectives, taking into account any simple negatives to avoid classifying "not happy" as positive? If so, are there any articles that discuss why this strategy is not realistic?

+36
machine-learning nlp sentiment-analysis
Oct 13 '10 at 4:25
source share
7 answers

A classic article by Peter Turny (2002) explains the method of conducting an uncontrolled mood analysis (positive / negative classification) using only the words good and poor as a set of seeds. Turney uses the mutual information of other words with these two adjectives to achieve an accuracy of 74%.

+55
Oct 14 '10 at 13:52
source share

I did not try to do an unprepared mood analysis, for example, you describe, but, from my head, I would say that you simplify the problem. Simply analyzing adjectives is not enough to get a good idea of ​​the mood of the text; for example, consider the word "stupid." Alone, you classify it as negative, but if in the product review should be "... [x] the product makes its competitors stupid, because at first it does not think about this function ...", then the mood there will certainly be positive, Great context in which the words appear, definitely matters in something like that. That is why one of the unprepared word bags alone (not to mention even more limited adjective bags) is not enough to adequately solve this problem.

Preclassified data (“learning data”) helps in that the problem shifts from trying to determine if the text has a positive or negative feeling from scratch, in order to determine if the text is more like positive texts or negative texts, and classify it as such way. Another important point is that text analyzes, such as mood analysis, are often highly dependent on differences in the characteristics of texts depending on the domain. This is why having a good training dataset (i.e. accurate data from the domain in which you work, and hopefully presenting the texts you need to classify) is just as important as creating a good system for classifying with.

Not quite an article, but hope this helps.

+15
Oct. 13 '10 at 6:35
source share

Turney paper (2002) mentioned by larsmans is a good base. In newer studies, Li and He [2009] introduced an approach using the Distributed Hidden Dirichlet Distribution (LDA) to train a model that can classify the overall situation and topic of an article simultaneously completely unattended. The accuracy of their achievement is 84.6%.

+3
Feb 02 '12 at 16:19
source share

I tried to identify keywords using an influence vocabulary to predict sentence-level mood labels. Given the generality of the dictionary (independent of the domain), the results were about 61%. This document is available on my home page.

A slightly improved version addresses negative adverbs. An entire system called EmoLib is available for demonstration:

http://dtminredis.housing.salle.url.edu:8080/EmoLib/

Hello,

+2
Oct 13 '10 at 7:33
source share

David,

I'm not sure if this helps, but you might want to check out Jacob Perkin's blog post using NLTK to analyze moods.

+2
Nov 22 '10 at 8:28
source share

I tried several mood analysis techniques for mining in the reviews. What worked best for me is the method described in Liu’s book: http://www.cs.uic.edu/~liub/WebMiningBook.html In this book, Liu and others compared many strategies and discussed various documents on analysis of moods and compilation of opinions.

Although my main goal was to extract functions from opinions, I applied a sense classifier to detect the positive and negative classification of these functions.

I used NLTK for preprocessing (word tokenization, POS marking) and creating trigrams. Then I also used Bayesian classifiers inside this place to compare with other strategies that Liu was accurate.

One of the methods is based on tagging as pos / neg each trigram expressing this information, and using some classifier for this data. The other method I tried and worked better (about 85% accuracy in my data set), calculated the sum of PMI points (punctual mutual information) for each word in the sentence and the words excellent / poor as seeds of class pos / neg,

+2
Mar 07 2018-12-15T00:
source share

There is no magic “label” in sentiment analysis, as with any other text analysis that seeks to detect the underlying “uncompromising” piece of text. Trying to make short methods of proven text analysis using a simplified "adjective" test or similar approaches leads to ambiguity, incorrect classification, etc., which at the end of the day give you poor accuracy, readable by mood. The finer the source (e.g. Twitter), the harder the problem.

0
Sep 18 2018-11-18T00:
source share



All Articles