NLTK words lemmatizing

I am trying to do lemmatization in words using NLTK .

Now I can find the stem package to get some results, for example, converting “cars” to “car” and “women” to “woman”, however I can’t lemmatize on some words with affixes as “recognition”.

When using WordNetLemmatizer() in a “confirmation”, it returns “confirmation” and using .PorterStemmer() , it returns “confirmation”, not “confirmation”.

Can someone tell me how to eliminate the affixes of words?
Say when an input is “confirmation”, the output must be “confirmed”

+4
source share
1 answer

Lemmatization does not (and should not) return "confirmation" for "confirmation". The first is a verb, and the last is a noun. On the other hand, a Porter-based algorithm simply uses a fixed set of rules. So, your only way to change the rules at the source. (NOT the right way to fix your problem).

What you are looking for is a derivation-related “confirmation” form, and WordNet is your best source for this. You can check this online in WordNet .

There are quite a few WordNet-based libraries that you can use to do this (for example, in JWNL in Java). In Python, NLTK should be able to get the derivative-related formula you saw on the Internet:

 from nltk.corpus import wordnet as wn acknowledgment_synset = wn.synset('acknowledgement.n.01') acknowledgment_lemma = acknowledgment_synset.lemmas[1] print(acknowledgment_lemma.derivationally_related_forms()) # [Lemma('admit.v.01.acknowledge'), Lemma('acknowledge.v.06.acknowledge')] 
+7
source

Source: https://habr.com/ru/post/1491708/


All Articles