Negative handling in NLP

I am currently working on a project where I want to extract emotions from a text. Since I use conceptnet5 (semantic network), I cannot, however, simply prefix the words in a sentence containing the word negation, since these words simply will not appear in the conceptnet5 API.

Here is an example:

The film was not so good.

Therefore, I decided that I could use the wordnet lemma functionality to replace adjectives in sentences containing negation words like (not, ...).

In the previous example, the algorithm would detect wasn't and replace it with was not . In addition, he would discover the word negation not and replace good with antonym bad . The sentence will read:

The film was so bad.

As long as I see that this is not the most elegant way, and probably leads to the wrong result in many cases, I would still like to deny this way, since I frankly do not know the best approach.

Given my problem: Unfortunately, I did not find any library that would allow me to replace all occurrences of the added negation words ( wasn't => was not ). I mean, I could do it manually, replacing the appearance with a regular expression, but then I would be stuck with English.

So I would like to ask if you know some of you library, function or best method that could help me here. I am currently using python nltk , but it does not seem to contain such functions, but I could be wrong.

Thank you in advance:)

+6
source share
1 answer

Cases of type wasn't can be simply analyzed with the help of tokenization ( tokens = nltk.word_tokenize(sentence) ): wasn't turned into was and n't .

But a negative meaning can also be formed by “Quasi-negative words, for example, hardly, rarely, rarely” and “Alleged negatives, such as denial, prevention, reluctance, denial, absence”, a look at. An even more detailed analysis can be found in Potts Criteria On Negativity of Denial .

Given your initial problem, mood analysis, most modern approaches, as far as I know, do not handle denials explicitly; instead, they use controlled approaches with high-order n-grams. Those who actually handle negation usually add the special NOT_ prefix to all words between the negation and punctuation characters.

+11
source

Source: https://habr.com/ru/post/982987/


All Articles