Why are there different lemmatizers in the NLTK library?

Question

Why are there different lemmatizers in the NLTK library?

>> from nltk.stem import WordNetLemmatizer as lm1
>> from nltk import WordNetLemmatizer as lm2
>> from nltk.stem.wordnet import WordNetLemmatizer as lm3

For me, all three work the same, but do they provide something else to confirm?

+4

python nlp nltk lemmatization

Abhishek Nov 09 '16 at 18:21

source share

2 answers

: .

inspect ,

>>> import inspect
>>> from nltk.stem import WordNetLemmatizer as wnl1
>>> from nltk.stem.wordnet import WordNetLemmatizer as wnl2
>>> inspect.getfile(wnl1)
'/Library/Python/2.7/site-packages/nltk/stem/wordnet.pyc'
# They come from the same file:
>>> inspect.getfile(wnl1) == inspect.getfile(wnl2)
True
>>> print inspect.getdoc(wnl1)
WordNet Lemmatizer

Lemmatize using WordNet built-in morphy function.
Returns the input word unchanged if it cannot be found in WordNet.

    >>> from nltk.stem import WordNetLemmatizer
    >>> wnl = WordNetLemmatizer()
    >>> print(wnl.lemmatize('dogs'))
    dog
    >>> print(wnl.lemmatize('churches'))
    church
    >>> print(wnl.lemmatize('aardwolves'))
    aardwolf
    >>> print(wnl.lemmatize('abaci'))
    abacus
    >>> print(wnl.lemmatize('hardrock'))
    hardrock

:

>>> print inspect.getsource(wnl1)
class WordNetLemmatizer(object):
    """
    WordNet Lemmatizer

    Lemmatize using WordNet built-in morphy function.
    Returns the input word unchanged if it cannot be found in WordNet.

        >>> from nltk.stem import WordNetLemmatizer
        >>> wnl = WordNetLemmatizer()
        >>> print(wnl.lemmatize('dogs'))
        dog
        >>> print(wnl.lemmatize('churches'))
        church
        >>> print(wnl.lemmatize('aardwolves'))
        aardwolf
        >>> print(wnl.lemmatize('abaci'))
        abacus
        >>> print(wnl.lemmatize('hardrock'))
        hardrock
    """

    def __init__(self):
        pass

    def lemmatize(self, word, pos=NOUN):
        lemmas = wordnet._morphy(word, pos)
        return min(lemmas, key=len) if lemmas else word

    def __repr__(self):
        return '<WordNetLemmatizer>'

# They have the same source code too:
>>> print inspect.getsource(wnl1) == inspect.getsource(wnl2)
True

NLTK WordNetLemmatizer :

\nltk
    __init__.py
    \stem.
        __init__.py  
        wordnet.py     # This is where WordNetLemmatizer code resides.

, WordNetLemmatizer nltk.stem.wordnet.py https://github.com/nltk/nltk/blob/develop/nltk/stem/wordnet.py#L15, :

from nltk.stem.wordnet import WordNetLemmatizer

nltk.stem. init.py, https://github.com/nltk/nltk/blob/develop/nltk/stem/ init.py#L30, nltk.stem WordNetLemmatizer,

from nltk.stem import WordNetLemmatizer

nltk.__init__.py :

from nltk.stem import *

nltk , nltk.stem. , nltk :

from nltk import WordNetLemmatizer

: , / NLTK, :

>>> from nltk.corpus import wordnet as wn1
>>> from nltk.corpus.reader import wordnet as wn2
>>> wn1 == wn2
False

>>> wn1.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

>>> wn2.synsets('dog')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'synsets'

wordnet wn1 - LazyCorpusLoader, wordnet nltk_data synsets: https://github.com/nltk/nltk/blob/develop/nltk/corpus/init.py#L246

wn2 - wordnet.py, nltk.corpus.wordnet.py: https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py

, :

>>> from nltk.corpus import wordnet as wn1
>>> from nltk.corpus.reader import wordnet as wn2
>>> from nltk.stem import wordnet as wn3
>>> wn3 == wn1
False
>>> wn3 == wn2
False

wn3 nltk.stem.wordnet.py, WordNetLemmatizer, wordnet corpus wordnet.

+1

alvas 10 . '16 0:55

harshil9968 · Accepted Answer · 2016-11-09T19:27:44+0000

No, they are not different, they are all the same.

from nltk.stem import WordNetLemmatizer as lm1
from nltk import WordNetLemmatizer as lm2
from nltk.stem.wordnet import WordNetLemmatizer as lm3

lm1 == lm2 
>>> True


lm2 == lm3 
>>> True


lm1 == lm3 
>>> True

How erip is fixed , why this happens because:

This class ( WordNetLemmatizer) is originally written to nltk.stem.wordnet so you can dofrom nltk.stem.wordnet import WordNetLemmatizer as lm3

What is also import into nltk __ init__.py file so you can dofrom nltk import WordNetLemmatizer as lm2

And also imported into __ init__.py nltk.stem so you can dofrom nltk.stem import WordNetLemmatizer as lm1

Why are there different lemmatizers in the NLTK library?

More articles: