Determining whether a word is a noun or not

Given the input word, I want to determine if it is a noun or not (in the case of ambiguity, for example, cook can be a noun or a verb, the word should be identified as a noun).

I actually use the POS tagger from Stanford Parser (I give it one word as input and I only extract the POS tag from the result). The results are not bad, but it takes a lot of time.

Is there a way (in python, please :) to complete this task faster than I actually am?

+6
source share
3 answers

If you just want to check if you can use one word as a noun, the fastest way would be to assemble many nouns, and then just check the word for membership in this set.

For a list of all nouns, you can use WordNet corpus (which can be accessed, for example, via NLTK):

 >>> from nltk.corpus import wordnet as wn >>> nouns = {x.name().split('.', 1)[0] for x in wn.all_synsets('n')} >>> "cook" in nouns True >>> "and" in nouns False 
+5
source

I can't talk about the Python shell, but if you are using the Stanford POS tagger and not a parser, this should be much faster. There are wrappers for Stanford CoreNLP that include the tagger: https://pypi.python.org/pypi/corenlp-python ; or it looks like nltk also has a tag module at Stanford http://www.nltk.org/_modules/nltk/tag/stanford.html .

You can also get better results if you insert one word into a toy sentence. Something like "X is the thing." Depending on the sentence, this may bias you aside or guess words as nouns.

+1
source

I would use the use of Wordnet if you are checking individual words. I also used the freely available TreeTagger: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ Binary programs run very fast and support multiple languages. If you need a clean Pythonic solution, check out the NLTK implementation of the Brill Tagger tag: http://www.nltk.org/_modules/nltk/tag/brill.html

0
source

Source: https://habr.com/ru/post/981243/


All Articles