Training set - share of pos / neg / neutral offers

I flag twitter messages as positive, negative, neutral. I am trying to assess if there is any logic that can be used to determine the set of workouts, which part of the message should be positive / negative and neutral?

So, for example, for. if I train the Naive Bayes classifier with 1000 Twitter posts, then the pos: neg: neutral share is 33%: 33%: 33% or 25%: 25%: 50%

Logically in my head, it seems that I am training (that is, I give more samples for neutral ones), that the system will better identify neutral sentences, whether they are positive or negative - is this true? or am I missing some theory here?

Thanks Rahul

+3
source share
2 answers

The problem you are talking about is called the imbalance problem. Many machine learning algorithms do not work well when faced with unbalanced learning data, i.e. When instances of one class are significantly superior to instances of another class. Read this article, . , , - , . ( ). mjv , . , , . , , , , ML , . , , , , , , .

: // Twitter . , , , .. , , , , / .

+3

... ( ) (, , ) . ( )

  • [] ""
  • [] , , , .

[] ; - , , . (, , , , , , bi- ..), .

, , , , , , post facto, .
, , , , , , ( ), . , , , [], , ( :-(, ...)

, , 3 - 5 , ( ) .

+1

Source: https://habr.com/ru/post/1728339/


All Articles