Using Wordnet to Create Superior, Comparative, and Adjectives

I have a wordnet database setup and am trying to generate synonyms for different words.

For example, the word "greatest." I will look through and find several different synonyms, but not one of them matches the definition - for example, one is "excellent."

I suppose that I need to do some kind of frequency check in a given language or print a word to get the base word (for example, the largest → excellent, excellent - better).

Which table should I use to ensure that my words make some sense?

+6
source share
1 answer

Neither the streamer nor the lemmatizer can get you from greatestgreat :

 >>> from nltk.stem import WordNetLemmatizer >>> from nltk.stem import WordNetLemmatizer, PorterStemmer >>> porter = PorterStemmer() >>> wnl = WordNetLemmatizer() >>> greatest = 'greatest' >>> porter.stem(greatest) u'greatest' >>> wnl.lemmatize(greatest) 'greatest' >>> greater = 'greater' >>> wnl.lemmatize(greater) 'greater' >>> porter.stem(greater) u'greater' 

But it looks like you can use some nice PennTreeBank tag properties to get from greatest -> great :

 >>> from nltk import pos_tag >>> pos_tag(['greatest']) [('greatest', 'JJS')] >>> pos_tag(['greater']) [('greater', 'JJR')] >>> pos_tag(['great']) [('great', 'JJ')] 

Try a rule-based crazy system, let it start with greatest :

 >>> import re >>> word1 = 'greatest' >>> re.sub('est$', '', word1) 'great' >>> re.sub('est$', 'er', word1) 'greater' >>> pos_tag([re.sub('est$', '', word1)])[0][1] 'JJ' >>> pos_tag([re.sub('est$', 'er', word1)])[0][1] 'JJR' >>> word1 'greatest' 

Now that we know that we can build our own excellent stemmer / lemmatizer / tail _substituter, write a rule that says that if a word gives an excellent POS tag and our tail_substituter gives us JJ when we start and JJR when we we can say with confidence that the comparative and basic form of a word can be easily obtained using our tail_substituter :

 >>> if pos_tag([word1])[0][1] == 'JJS' \ ... and pos_tag([re.sub('est$', '', word1)])[0][1] == 'JJ' \ ... and pos_tag([re.sub('est$', 'er', word1)])[0][1] == 'JJR': ... comparative = re.sub('est$', 'er', word1) ... adjective = re.sub('est$', '', word1) ... >>> adjective 'great' >>> comparative 'greater' 

Now you get from greatest -> greater -> great . From great -> best is kind of weird because they are not lexically related to each other, although a relative of relatives seems related.

So, I think it would be subjective to say that great -> best is a valid conversion

+4
source

Source: https://habr.com/ru/post/978904/


All Articles