I am trying to optimize the performance of a script that searches for similar words in the lexicon for each specified word.
Each unique word should be divided into letters of n-grams and for each n-gram vocabulary returns a list of words containing the same letter of n-grams. Each word from this list is then added to the dictionary as a key, and its value is increased by one. This gives me a dictionary of similar words with corresponding frequency ratings.
word_dict = {}
get = word_dict.get
for letter_n_gram in word:
for entry in lexicon[n_gram]:
word_dict[entry] = get(entry, 0) + 1
This implementation works, but the script can work faster by switching dictto collections.defaultdict.
word_dd = defaultdict(int)
for letter_n_gram in word:
for entry in lexicon[n_gram]:
word_dd[entry] += 1
No other code has been changed.
, ( , ) , .. , 1, , 1.
0, .
defaultdict ? , word_dd 0?
edit: , script , , :
for item in word_dd.iteritems():
if item[1] == 0:
print "Found zero value element"
break