How is the Vader polarity rating calculated in Python NLTK?

I use Vader SentimentAnalyzer to get polarity points. I used to use probability estimates for positive / negative / neutral, but I just realized that a โ€œcompoundโ€ rating, starting from -1 (most negative) to 1 (most position), will provide a uniform measure of polarity. I wonder how the โ€œcompositeโ€ score is calculated. Is this calculated from the vector [pos, neu, neg]?

+13
source share
2 answers

The VADER algorithm displays estimates of grades up to 4 classes of feelings https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L441 :

  • neg : Negative
  • neu : Neutral
  • pos : Positive
  • compound : Connection (i.e. aggregated score)

Having skipped the code, the first instance of the connection is located at https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L421 , where it calculates:

 compound = normalize(sum_s) 

The normalize() function is defined as such in https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L107 :

 def normalize(score, alpha=15): """ Normalize the score to be between -1 and 1 using an alpha that approximates the max expected value """ norm_score = score/math.sqrt((score*score) + alpha) return norm_score 

So, there is an alpha hyperparameter.

As for sum_s , this is the sum of sentiment arguments passed to score_valence() function https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L413

And if we trace this sentiment argument, we will see that it is calculated by calling the polarity_scores() function in https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L217 :

 def polarity_scores(self, text): """ Return a float for sentiment strength based on the input text. Positive values are positive valence, negative value are negative valence. """ sentitext = SentiText(text) #text, words_and_emoticons, is_cap_diff = self.preprocess(text) sentiments = [] words_and_emoticons = sentitext.words_and_emoticons for item in words_and_emoticons: valence = 0 i = words_and_emoticons.index(item) if (i < len(words_and_emoticons) - 1 and item.lower() == "kind" and \ words_and_emoticons[i+1].lower() == "of") or \ item.lower() in BOOSTER_DICT: sentiments.append(valence) continue sentiments = self.sentiment_valence(valence, sentitext, item, i, sentiments) sentiments = self._but_check(words_and_emoticons, sentiments) 

Looking at the polarity_scores function, it iterates through the entire SentiText vocabulary and checks using the sentiment_valence() function based on rules to assign a valence rating to the mood https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader .py # L243 , see Section 2.1.1 http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

So, returning to a composite assessment, we see that:

  • compound evaluation - normalized evaluation of sum_s and
  • sum_s is the sum of the valency calculated based on some heuristics and vocabulary of feelings (for example, the intensity of temptation) and
  • a normalized estimate is simply sum_s divided by its square plus an alpha parameter that increases the denominator of the normalization function.

Is this a calculation from the vector [pos, neu, neg]?

Not really =)

If we look at the score_valence function https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L411 , we will see that the composite score is calculated using sum_s to pos, neg and neu are calculated with using _sift_sentiment_scores() , which computes the estimates of invidiual pos, neg and neu, using the original estimates from sentiment_valence() without a sum.


If we look at this mathematical math alpha , it seems that the normalization output is quite unstable (if you leave it unlimited), depending on the value of alpha :

alpha=0 :

enter image description here

alpha=15 :

enter image description here

alpha=50000 :

enter image description here

alpha=0.001 :

enter image description here

He becomes funky when he is negative:

alpha=-10 :

enter image description here

alpha=-1,000,000 :

enter image description here

alpha=-1,000,000,000 :

enter image description here

+49
source

There is a description in the "About scoring" section of the github repository .

0
source

Source: https://habr.com/ru/post/1258995/


All Articles