Initializing the defaultdict vs dict element

I am trying to optimize the performance of a script that searches for similar words in the lexicon for each specified word.

Each unique word should be divided into letters of n-grams and for each n-gram vocabulary returns a list of words containing the same letter of n-grams. Each word from this list is then added to the dictionary as a key, and its value is increased by one. This gives me a dictionary of similar words with corresponding frequency ratings.

word_dict = {}
get = word_dict.get
for letter_n_gram in word:
    for entry in lexicon[n_gram]:
        word_dict[entry] = get(entry, 0) + 1

This implementation works, but the script can work faster by switching dictto collections.defaultdict.

word_dd = defaultdict(int)
for letter_n_gram in word:
    for entry in lexicon[n_gram]:
        word_dd[entry] += 1

No other code has been changed.

, ( , ) , .. , 1, , 1.

0, .

defaultdict ? , word_dd 0?

edit: , script , , :

for item in word_dd.iteritems():
    if item[1] == 0:
        print "Found zero value element"
        break
+4
3

, .

n- , .

. , , , collections.defaultdict, factory None.

, , - , .

, , - .

- .

0

defaultdict, , . int factory, 0.

from collections import defaultdict
d = defaultdict(int)
print d["a"]
# 0
print d
# defaultdict(<type 'int'>, {'a': 0})

, , , defaultdict,

print "a" in d
# False
+6

:

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> d['foo']
0

:

>>> 'bar' in d
False
>>> 'foo' in d
True

n-, , , collections.Counter():

from collections import Counter

word_counter = Counter()
for letter_n_gram in word:
    word_counter.update(lexicon[n_gram])

Counter.update() , lexicon[n_gram].

defaultdict(int), Counter() , - 0.

+6

Source: https://habr.com/ru/post/1536329/


All Articles