Get gensim and use the Similar_by_word method on the gensim.models.Word2Vec model.
similar_by_word takes 3 parameters,
- Input word
- n - for the top n similar words (optional, default = 10)
- restrict_vocab (optional, default = None)
Example
import gensim, nltk class FileToSent(object): """A class to load a text file efficiently """ def __init__(self, filename): self.filename = filename
Then, depending on your input suggestions (description_file.txt),
sentences = FileToSent('sentence_file.txt') model = gensim.models.Word2Vec(sentences=sentences, min_count=2, hs=1) print model.similar_by_word('hack', 2) # Get two most similar words to 'hack' # [(u'debug', 0.967338502407074), (u'patch', 0.952264130115509)] (Output specific to my dataset)
source share