What is the proper way to deal with (assess) variance in analyzing moods on different topics in relation to

I analyze the mood in a social network. Based on various topics with regard to input quality. How can we deal with the dispersion of individual topics?

For example: we are trying to assess the mood on a topic that is an event that includes various keywords, say, an Innovation Week topic with the following topics (keywords or synonyms):

Innovation week = {"innovation week", "data solution", "emerging technologies", "august 30"...}.

What if the standard deviation of the scores is so large. We ask:

  • The algorithm of analysis of feelings?

  • Our input keywords

  • Or are we just accepting the results as is? since they represent different views of people at different levels of detail that make up the topic? The goal, finally, is to have a general understanding of the topic.

I think the question is simple, although this is a concern in any analysis of the analysis of feelings in social networks.

+5
source share
2 answers

The short answer is both the algorithm and the input keywords, since they depend on each other. With the right input, variance will occur in any algorithm, and given the wrong algorithm, the same will happen for any input.

Usually in these cases you should revise the algorithm, as is the case in most situations.

You can also read this to better understand it: http://www.cs.cornell.edu/home/llee/omsa/omsa-published.pdf

0
source

If you are unsure of your algorithm, you can use the NLTK Vader Sentimenter to check the results. But it may be that the answers are so varied that the standard deviation estimates are so large.

Do you have test data to test your algorithm? If not, you should have them anyway to measure standard algorithm measurements.

Standard measurements

+1
source

Source: https://habr.com/ru/post/1271267/


All Articles