I am looking to get the similarities between one word and each word in a sentence using NLTK.
NLTK can get similarities between two specific words, as shown below. This method requires a specific reference to the word, in this case "dog.n.01", where the dog is a noun, and we want to use the first (01) definition of NLTK.
dog = wordnet.synset('dog.n.01') cat = wordnet.synset('cat.n.01') print dog.path_similarity(cat) >> 0.2
The problem is that I need to get some of the speech information from each word in the sentence. The NLTK package has the ability to receive parts of speech for each word in a sentence, as shown below. However, these speech parts ("NN", "VB", "PRP" ...) do not correspond to the format that synset accepts as a parameter.
text = word_tokenize("They refuse to permit us to obtain the refuse permit") pos_tag(text) >> [('They', 'PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB'), ('us', 'PRP'), ('to', 'TO'), ('obtain', 'VB'), ('the', 'DT'), ('refuse', 'NN'), ('permit', 'NN')]
Is it possible to get sync data from pos_tag () in NLTK? By formatting in syntax format, I mean a format like dog.n.01