Lemmainser uses NLTK

I currently have the following suggestion:

text = "This is a car."

Then I use a tokenizer, and then create it like this:

ps = PorterStemmer()
text = word_tokenize(text)
stemmed_words = []
for w in words:
    stemmed_words.append(ps.stem(w))

However, now I want to use the NLTK lemmainser for this. To use it, I need to give it a part of the word of such a word (inside the loop):

lemmatizer = WordNetLemmatizer()
word = lemmatizer.lemmatize(w, pos=pos)

However, I'm not sure how to get the pos argument. I understand that I can use this to get some of the speech, but this is not accepted as an argument:

pos = nltk.pos_tag(text)
+4
source share
1 answer

You need a dictionary to translate POS NLTK tags into WordNet tags:

pos_translate = {'J':'a', 'V':'v', 'N':'n', 'R':'r'}

, POS, , ( , , , "n" lemmatize:

text = ['This', 'is', 'a', 'car', '.']
[lemmatizer.lemmatize(w,\
       pos=pos_translate[pos[0]] if pos[0] in pos_translate else 'n')\
       for w,pos in nltk.pos_tag(text)]
# ['This', 'be', 'a', 'car', '.']
+5

Source: https://habr.com/ru/post/1663476/


All Articles